Python tip of the week 7, Paths and Files

You may wish to load files in your python code. Sometimes you might want to pass a directory for input or output as a program argument or function parameter. Image that you have an output directory and want to write some files to it. You may be tempted to do something like this:

>>> filename = output_dir + "data.json"

but this requires that output_dir ends in a / , otherwise you’ll just write a file “resultsdata.json” instead of “results/data.json”. You could ask people to ensure that it ends with a /, or check it yourself, but this can get messy. Instead, use os.path.join:

>>> filename = os.path.join(output_dir, "data.json")

This will automatically join arguments with a directory separator if needed (\ in windows!), and can take any number of arguments, which it will join.

In python 3, you can also use the pathlib module which is pretty cool: https://docs.python.org/3/library/pathlib.html

>>> output_dir = pathlib.Path(dirname)
>>> filename = output_dir / "data.json"

This overrides the division operator to allow you to join paths together. As long as one of the items is a Path object, you can perform this operation.

The os.path and os modules have many other useful methods:

Take a look at the documentation, but here are a few methods which I use often:

Splitting a filename into parts

# Takes care of multiple . in the filename, some extensions are longer than 3 characters
>>> os.path.splitext("myfile.data.json")
('myfile.data', '.json')

# Get the filename from a full path
>>> os.path.basename("/path/to/myfile.json")
myfile.json

# Get the directory name from a full path
>>> os.path.dirname("/path/to/myfile.json")
/path/to

Making directories

# os.mkdir can only make one directory at a time. If you want to make a tree, use:
>>> os.makedirs("all/of/my/directories")
# This will raise an exception if the final directory already exists. In python 3 use
>>> os.makedirs("all/of/my/directories", exists_ok=True)
# or
import errno, os
try:
    os.makedirs("all/of/my/directories")
except OSError as e:
    if e.errno != errno.EEXIST:
        raise

Why shouldn’t you use os.path.exists() instead of catching OSError? Because it’s theoretically possible that between the time that you check if a directory exists, and you make it, some other process could create this directory. It’s better to check the exception instead.

Getting file lists from directories

Often you might want to scan a directory and get files in it:

Single directory:

>>> os.listdir("/my/directory")
>>> glob.glob("/my/directory/*.txt")

Recursive:

>>> all_files = []
>>> for root, dirs, files in os.walk("/my/directory"):
...     for f in files:
...         all_files.append(os.path.join(root, f)) if f.endswith(".txt")

>>> glob.glob("/my/directory/*/*.txt", recursive=True)

Other file operations

Also check out the shutil module for functions to copy and move files: https://docs.python.org/3/library/shutil.html