Sometimes, you inherit a large, complex, and often unmaintainable codebase written entirely in shell scripts (like sh
,
bash
, ksh
, etc.). In this situation, porting it to a more accessible, safer, more popular,
feature-rich and in every way superior language such as Python can be a good idea.
But how do we ensure we don't introduce regressions or bugs during this process? Even setting aside any personal "obsession"
with testing, the answer is, of course, tests! However, when porting the tests to the new language
simultaneously, we risk breaking them just as much as breaking the actual code we are migrating.
This risk can be mitigated by writing our tests in Python right from the start. And to ensure these new tests cover as much of the existing shell code's behavior as possible, we need to measure their coverage of that original shell codebase. Since we are already migrating to Python, this would ideally use existing Python tooling like pytest and Coverage.py.
Enter coverage-sh!
I built this package to measure the coverage of shell scripts executed from Python. It works as a Coverage.py plugin, integrating directly into the standard Python coverage workflow.
After installing e.g. via pip
pip install coverage-sh
we can tell Coverage.py to load the plugin, e.g. in our pyproject.toml
file:
[tool.coverage.run]
plugins = ["coverage_sh"]
Once configured, we can use Coverage.py just like we normally would:
coverage run main.py
coverage combine
coverage html
Or when using pytest-cov, simply running the tests is enough:
pytest --cov-report html --cov=myproj tests/
The result is the Shell code being displayed in the coverage.py report alongside the python code, e.g.:
Under the Hood
coverage-sh works primarily by patching Python's subprocess.Popen
class to set the ENV
and BASH_ENV
environment variables prior to the shell script's execution. As
documented in
the Bash manual, the value
of these variables specifies "the name of a startup file to read before executing the script". This provides a mechanism
to execute arbitrary shell code before the target script begins. The code injected by coverage-sh
is as follows:
#!/bin/sh
PS4="COV:::\${BASH_SOURCE}:::\${LINENO}:::"
exec {BASH_XTRACEFD}>>"<path to a named pipe>"
export BASH_XTRACEFD
set -x
First, we set the PS4
variable. According to the manual, PS4
is "the prompt printed before the command line is echoed
when the -x option is set". This causes each trace line to be
prefixed with a marker (COV:::
), the source file path (${BASH_SOURCE}
), and the line number (${LINENO}
).
When combined with set -x
, which enables command tracing, the shell outputs each executed
command, prepended by the custom PS4
string. For a small "hello world" script:
#!/bin/bash
variable="Hello, World!"
echo $variable
this would produce
COV:::tests/demo.sh:::2:::variable='Hello, World!'
COV:::tests/demo.sh:::3:::echo Hello, 'World!'
Hello, World!
To distinguish between the tracing information and the script's regular output, the injected startup script
redirects the trace output (controlled by BASH_XTRACEFD
) from STDERR to a named pipe, created by the patched Popen
class.
A dedicated thread is then spawned to continuously read data from this specific pipe. This thread is responsible for
receiving the trace lines, parsing the
"COV:::"-prefixed data, and writing this information into a Coverage.py .coverage
file.
Since subprocess.Popen is just one common way among many to execute shell code from Python, the plugin also provides a "Cover-Always Mode".
When this mode is enabled, instead of just patching the subprocess.Popen class, the plugin covers any shell script executed by the process after the plugin is loaded, until the main process finishes. This mode is not compatible with pytest-cov in its default execution mode but works perfectly when starting pytest via the Coverage.py runner, like this:
coverage run -m pytest arg1 arg2 arg3
After the code has finished running and Coverage.py processes the data, coverage-sh
scans for files within our project that have the MIME type text/x-shellscript
. For each shell script found, it uses the
powerful tree-sitter parsing library to build an Abstract Syntax Tree (AST). This AST is used to accurately identify
which lines in the shell script are actually executable code.
The final result is a comprehensive Coverage.py report where our shell code is displayed with detailed line-by-line coverage information, showing covered, uncovered, and non-executable lines: