Skip to main content
  1. Posts/

Unintuitive Behaviour of Case Sensitivity in Python glob

··366 words·2 mins·
Python
Table of Contents

tl;dr: glob.glob() is case sensitive in Linux, but case insensitive in Windows.

Recently, I was bitten by the unintuitive behaviour of glob.glob(). I think it would be beneficial to write down what I have found.

A little background. I wanted to find all the images under directory test_img with extensions .jpg or *.JPG on my Windows 10 machine. My initial code was like:

import glob

ext = '.jpg'
im_paths1 = glob.glob('test_img/' + '*' + ext)
im_paths2 = glob.glob('test_img/' + '*' + ext.upper())

I expect that im_paths1 and im_paths2 contain the paths of all the images ending in .jpg and .JPG respectively. But the truth is that im_paths1 and im_paths2 are exactly the same: all images whose names end with either .jpg or .JPG have been matched, i.e., glob.glob() is case insensitive on Windows!

I run the same code on Linux and find that glob.glob() is case sensitive.

This inconsistent behaviour on different platforms drives me to read the source code of glob module. It seems that the culprit is fnmatch.filter(), which is used by glob to get the matching file paths (relevant code is here). fnmatch.filter() uses os.path.normcase() for the pattern and filenames in non-POSIX systems (relevant code here). That is why glob.glob() can not distinguish between lower and upper case files on the Windows platform.

This behaviour is a bad design in my opinion, which should be notified to the users.

To keep the behaviour of glob.glob() consistent across different systems, I write the following method to find files in a case sensitive manner on Windows:

Click to check the code.
def find_files(directory, pat):
    """
    Find files in a case sensitive way on Windows.

    Parameters
    ----------
    directory: str
        The directory where you want to find files, can be relative or
        absolute path.
    pat: str
        The pattern of file names you want find, for example,`*.jpg` or
        `*.JPG`.

    Returns
    -------
    A list of file paths matching the given pattern. Empty if no files under
        the directory matches the pattern.
    """
    path_pattern = os.path.join(directory, pat)
    pths = glob.glob(path_pattern)

    match = re.compile(fnmatch.translate(path_pattern)).match
    valid_pths = [pth for pth in pths if match(pth)]

    return valid_pths


print(find_files('test_img', '*.jpg'))

References
#

Related

Retry for Google Cloud Client
·197 words·1 min
Python GCP
Make Python logging Work in GCP
·570 words·3 mins
Python Logging GCP
Speed up document indexing in Elasticsearch via bulk indexing
·355 words·2 mins
Python Elasticsearch