Recently, I was bitten by the unintuitive behaviour of
glob.glob() and I
think it would be beneficial to write down what I have found.
A little background. I wanted to find all the images under the directory
test_img with extensions
*.JPG on my Windows 10 machine. My
initial code was like:
import glob ext = '.jpg' im_paths1 = glob.glob('test_img/' + '*' + ext) im_paths2 = glob.glob('test_img/' + '*' + ext.upper())
I expect that
im_paths1 contains the paths of all the images ending in
im_paths2 contains the paths of the all the images ending in
the truth is that
im_paths2 are exactly the same: all images
whose names end with either
.JPG have been matched, i.e.,
glob.glob() is case insensitive on Windows.
I run the same code on Linux and find that
glob.glob() is case sensitive.
This inconsistent behaviour on different platforms drives me to read the
source code of
It seems that the culprit is
which is used by
glob to get the matching file paths (relevant code is
os.path.normcase() for the pattern and filenames in non-POSIX systems (see
the relevant code here).
That is why
glob.glob() can not distinguish between lower and upper case
files on the Windows platform.
This behaviour is a bad design in my opinion, which should be notified to the users.
To keep the behaviour of
glob.glob() consistent across different systems, I
write the following method to find files in a case sensitive manner on Windows:
Click to see the code.
def find_files(directory, pat): """ Find files in a case sensitive way on Windows. Parameters ---------- directory: str The directory where you want to find files, can be relative or absolute path. pat: str The pattern of file names you want find, for example,`*.jpg` or `*.JPG`. Returns ------- A list of file paths matching the given pattern. Empty if no files under the directory matches the pattern. """ path_pattern = os.path.join(directory, pat) pths = glob.glob(path_pattern) match = re.compile(fnmatch.translate(path_pattern)).match valid_pths = [pth for pth in pths if match(pth)] return valid_pths print(find_files('test_img', '*.jpg'))
License CC BY-NC-ND 4.0