When I am working on a Python project, I am using black to format the code, so that we have a unified format across the code base.
One pain point, however, is the super annoying trailing white spaces:
- black can remove trailing spaces for doc string, code and comment, but it will not touch trailing spaces in, e.g., multi-line strings.
my_multiline_str = """
this is a string that
spans multiple line.
<space><space>
The above line has trailing spaces that black won't touch
"""
- For other files such as YAML, black can not format those
- Not everyone is using the same tool, which makes enforcement of removing trailing white spaces locally hard
So I want to check if any of the files tracked by git has trailing spaces, and if yes, break the pipeline. This will force everyone to be aware of the issue and fix them on their side.
How do we do this then?
with grep#
We can achieve this with grep:
if grep --recursive --line-number --exclude-dir="*cache*" '[[:blank:]]$' src/; then
echo "Trailing whitespaces found"
exit 1
fi
In the above, option --exclude-dir is used to exclude some directories to reduce noise.
The exit 1 is used to break the pipeline run.
Or, you can combine grep with git:
if git ls-files | xargs grep --recursive --line-number '[[:blank:]]$'; then
echo "Trailing whitespaces found"
exit 1
fi
with git#
The git diff --check command can check if there are trailing white spaces in your current diff,
however, if you have committed the code changes, this command is useless.
This means that in the pipeline, this command does not work since the code is already committed.
Base on this post here, we can instead use the git diff-tree command.
git diff-tree --check $(git hash-object -t tree /dev/null) HEAD
The command git hash-object -t tree /dev/null just compute an empty hash,
so that we can compare the current code against “EMPTY”.
With this trick, we can check the trailing white spaces even if the code is committed.
references#
- find file with trailing white spaces: https://stackoverflow.com/a/29308492/6064933