How to Structure Python Packages and Why

In August 2019 Doug reviewed current opinions and recommended practices for Python packaging and arrived at the recommendations below for the directories and files layout of Python packages developed by the UBC EOAS MOAD group. The rpn-to-gemlam tool package was the first package that was converted from the previously used package layout to what is recommended here, so its files are used as examples below.

References

Note

There are 3 key things that people who are familiar with Python packaging need to understand about how the UBC EOAS MOAD group uses its internally developed packages:

  1. We rely heavily on “Editable” Installs. That is, we install our packages from Git clones of the repositories using the python3 -m pip install --editable (or pip install -e) command. That makes the workflow for getting updates into our installed packages a simple git pull in the package repository clone directory.

  2. On our local workstations and laptops we work in conda environments, either the base environment created by installing the Anaconda Python distribution, or project-specific environments.

  3. On HPC clusters we use the system-provided Python 3 module and install our packages using the “user scheme” for installation in combination with “Editable” Installs, that is, python3 -m pip install --user -e.

Package Layout

Using the rpn-to-gemlam tool package as an example, the directories and files layout of a MOAD package looks like:

rpn-to-gemlam/
├── docs/
│   ├── conf.py
│   ├── index.rst
│   ├── Makefile
│   ├── ...
│   └── _static/
│       ├── ...
├── envs/
│   ├── environment-dev.yaml
│   ├── environment-rtd.yaml
|   ├── requirements.txt
├── .readthedocs.yaml
├── LICENSE
├── pyproject.toml
├── README.rst
├── rpn_to_gemlam/
│   ├── __init__.py
│   ├── ...
├── setup.cfg
└── tests/
    ├── ...

In summary (please see the sections below for detail explanations):

It typically contains 5 files and 4 sub-directories.

The 5 files are:

The 4 sub-directories are:

The __init__.py file in the Package Code Sub-directory provides the package version identifier string as a variable named __version__.

Top-Level Directory

The name of the top-level directory is the “project name”. It does not have to be the same as the “package name” that you use in import statements. In this example the “project name” is rpn-to-gemlam, and the “package name” is rpn_to_gemlam. Other examples of MOAD project and package names are:

  • the moad_tools package is named moad_tools

  • the SalishSeaTools package is named salishsea_tools

  • the SalishSeaNowcast package is named nowcast

The top-level directory “project name” is generally the name of the project’s Git repository, however, keep in mind that Bitbucket converts repository names to all-lowercase.

Package Files

The top-level directory must contain 4 files that contain the information necessary to create a Python package. It also contains a file to tell https://readthedocs.org/ how to configure an environment in which to build the package documentation.

pyproject.toml File

The pyproject.toml file contains the build system requirements and build backend tools to use for creation of the package. It is documented at https://setuptools.pypa.io/en/latest/build_meta.html.

We use setuptools as our build backend, so our pyproject.toml files always look like:

[build-system]
requires = ["setuptools", "wheel"]
build-backend = "setuptools.build_meta"

Warning

Editable installs of packages that contain a pyproject.toml file are not supported by pip<21.3 (released 11-Oct-2021). At time of writing (29-Oct-2021) the Compute Canada clusters are using pip=20.0.2. Until they upgrade to pip>=21.3 any of our packages that need to be installable on one of those clusters cannot contain a pyproject.toml file.

In place of the pyproject.toml file, packages that have to be installed on Compute Canada clusters require a setup.py file containing:

import setuptools
setuptools.setup()

setup.cfg File

The setup.cfg file contains the package metadata and setuptools options for creation of the package. It is documented at https://setuptools.pypa.io/en/latest/userguide/declarative_config.html.

A minimal setup.cfg file looks like:

[metadata]
name = project_name
version = 1.0
description = One line description of the package
author = your name
auhor_email = your email address

[options]
zip_safe = False
include_package_date = True
packages = find:
install_requires =
    list of packages that the package depends on, one per line

The setup.cfg file for the rpn-to-gemlam tool package (with the copyright header comment block excluded) looks like:

[metadata]
name = rpn-to-gemlam
version = attr: rpn_to_gemlam.__version__
description = ECCC RPN to SalishSeaCast NEMO Atmospheric Forcing Conversion Tool
author = Doug Latornell
author_email = dlatornell@eoas.ubc.ca
url=https://github.com/SalishSeaCast/rpn-to-gemlam
long_description = file: README.rst
license = Apache License, Version 2.0
platform = Linux
classifiers =
    Development Status :: 3 - Alpha
    License :: OSI Approved :: Apache Software License
    Programming Language :: Python :: Implementation :: CPython
    Programming Language :: Python :: 3
    Programming Language :: Python :: 3.6
    Programming Language :: Python :: 3.7
    Operating System :: POSIX :: Linux
    Operating System :: Unix
    Environment :: Console
    Intended Audience :: Science/Research
    Intended Audience :: Education

[options]
zip_safe = False
include_package_data = True
packages = find:
python_requires = >=3.6
install_requires =
    # see envs/environment-dev.yaml for conda environment dev installation
    # see envs/requirements.txt for versions most recently used in development
    angles
    arrow
    bottleneck
    Click
    matplotlib
    netCDF4
    python-dateutil
    pytz
    requests
    retrying
    scipy
    xarray
    # python3 -m pip install --editable ../tools/SalishSeaTools

[options.entry_points]
console_scripts =
    rpn-to-gemlam = rpn_to_gemlam.rpn_to_gemlam:cli

The [options.entry_points] stanza is an example of the declaration of entry points. They are used in packages that use a framework like Click or Cliff to provide a command-line interface.

Note

Declaration of entry points in setup.cfg is supported by setuptools>=51.0.0 released on 6-Dec-2020. Any newly created conda environment will include a version of setuptools much newer than that. The same is true of the Compute Canada StdEnv/2020 environment. The only platform where we can’t support this feature is orcinus.

README.rst File

The README.rst file provides a more than one line description of the package. Take a look some of the UBC EOAS MOAD repositories to get an idea of typical contents. README.rst should include a copyright and license section.

The README.rst file is included as the long_description metadata value in the setup.cfg File by including the line:

long_description = file: README.rst

in the [metadata] section.

README files written using reStructuredText (or Markdown) are automatically rendered to HTML in Bitbucket web pages.

LICENSE File

The LICENSE contains the legal license text for the package. We release all of our open code under the Apache License, Version 2.0

So, you can just copy the LICENSE file from another MOAD repository. Be sure to include the license declaration via the license metadata value in the setup.cfg File by including the line:

license = Apache License, Version 2.0

in the [metadata] section.

.readthedocs.yaml File

For packages that use https://readthedocs.org/ to render and host their documentation, we include a .readthedocs.yaml file in the top-level directory (the file name and location are stipulated by readthedocs). That file declares the features of the environment that we want readthedocs to use to build our docs, specifically, a conda environment that we describe in the envs/environment-rtd.yaml file (described below), and the most recent version of Python.

The .readthedocs.yaml file for the rpn-to-gemlam tool package is typical, and looks like:

version: 2

build:
  os: ubuntu-20.04
  tools:
    python: "mambaforge-4.10"

conda:
  environment: envs/environment-rtd.yaml

# Only build HTML and JSON formats
formats: []

Package Sub-Directories

The top-level directory must contain a package sub-directory in which the Python modules that are the package code are stored. There are also usually 3 other sub-directories that contain:

  • the package documentation (docs/)

  • descriptions of the conda environments used for development of the package and building its documentation (envs/)

  • the unit test suite for the package (tests/)

Package Code Sub-directory

The package code sub-directory is where the Python modules that are the package code are stored. Its name is the package name that is used in import statements. In the the rpn-to-gemlam tool package the package sub-directory is named rpn_to_gemlam.

Because the package name is used in import statements it must follow the rules that Python imposes on module names:

  • contain only letters, numbers, and underscores

  • not start with a number

By convention, package names are all-lowercase, and use underscores when they improve readability. A leading underscore is the convention that indicates a private module, variable, etc., so a package name that starts with an underscore would be unusual and confusing.

The package sub-directory must contain a file called __init__.py (often pronounced “dunder init”). The presence of a __init__.py file is what makes a directory and the Python modules it contains importable.

In MOAD packages the __init__.py file in the package sub-directory contains a declaration of a variable named __version__, for example:

__version__ = "19.1.dev0"

We use a CalVer versioning scheme that conforms to PEP-440. The version identifier format is yy.n[.devn], where yy is the (post-2000) year of release, and n is the number of the release within the year, starting at 1. After a release has been made the value of n is incremented by 1, and .dev0 is appended to the version identifier to indicate changes that will be included in the next release.

The __version__ value is included as the version metadata value in the setup.cfg File by including the line:

version = attr: package_name.__version__

in the [metadata] section. Be sure to replace package_name with the package name you chose for the Package Code Sub-directory.

docs/ Sub-directory

The docs/ directory contains the Sphinx source files for the package documentation. This directory is initialized by creating it, then running the sphinx-quickstart command in it.

After initializing the docs/ directory, its conf.py file requires some editing. Please see docs/conf.py in the rpn-to-gemlam tool package for an example of a “finished” file.

The key things that need to be done are:

  • Add:

    import os
    import sys
    
    sys.path.insert(0, os.path.abspath(".."))
    

    to the # -- Path setup ---------- section of the file to make the package code directory tree available to the Sphinx builder for collection of package metadata, automatic generation of documentation from docstrings, etc.

  • Change the project code in the # -- Project information --------- section to:

    import configparser
    
    setup_cfg = configparser.ConfigParser()
    setup_cfg.read(os.path.abspath("../setup.cfg"))
    project = setup_cfg["metadata"]["name"]
    

    to get the project name from the metadata section of the setup.cfg File.

  • Change the copyright code in the # -- Project information --------- section to something like:

    import datetime
    
    pkg_creation_year = 2019
    copyright_years = (
        f"{pkg_creation_year}"
        if datetime.date.today().year == pkg_creation_year
        else f"{pkg_creation_year}-{datetime.date.today():%Y}"
    )
    copyright = f"{copyright_years}, {author}"
    

    to ensure that the copyright year range displayed in the rendered docs is always up to date (at least as of the most recent rendering).

  • Change the version and release code in the # -- Project information --------- section to something like:

    import package_name
    
    version = package_name.__version__
    release = version
    

    to get the package version identifier from the __version__ variable in the package __init__.py file. Be sure to replace package_name with the package name you chose for the Package Code Sub-directory.

envs/ Sub-directory

The envs/ sub-directory contains at least 3 files that describe the conda environments for the package development and docs building environments.

environment-dev.yaml File

The environment-dev.yaml file is the conda environment description file for the package development environment. It includes all of the packages necessary to install, run, develop, test, and document the package.

For example, the environment-dev.yaml file for the rpn-to-gemlam tool package looks like:

# conda environment description file for rpn-to-gemlam package
# development environment
#
# Create a conda environment for development, testing and documentation of the package
# with:
#
#   $ conda env create -f rpn-to-gemlam/environment-dev.yaml
#   $ conda activate rpn-to-gemlam
#   (rpn-to-gemlam)$ python3 -m pip install --editable ../tools/SalishSeaTools
#   (rpn-to-gemlam)$ python3 -m pip install --editable rpn-to-gemlam
#
# The environment will include all of the tools used to develop,
# test, and document the rpn-to-gemlam package.
#
# See the envs/requirements.txt file for an exhaustive list of all of the
# packages installed in the environment and their versions used in
# recent development.

name: rpn-to-gemlam

channels:
  - conda-forge
  - nodefaults

dependencies:
  - arrow
  - bottleneck
  - Click
  - matplotlib
  - netCDF4
  - pip
  - python=3.7
  - python-dateutil
  - pytz
  - requests
  - retrying
  - scipy
  - xarray

  # For unit tests
  - coverage
  - pytest

  # For documentation
  - sphinx
  - sphinx_rtd_theme

  # For coding style
  - black

  - pip:
      - angles
  • The comments at the top of the file include a succinct version of the commands required to create the dev environment.

  • The recommended conda channel to get packages from is conda-forge. nodefaults is included in the channels list to speed up the packages dependency solver because it is now rare for us to require packages from any other source than conda-forge .

  • Packages that are unavailable from conda channels are installed via pip.

The environment-dev.yaml file is “hand-crafted” rather than being generated via the conda env export command. As such, it contains only the top level dependency packages, and only version specifications that are absolutely necessary. That allows the conda solver do its job to assemble a consistent set of up-to-date packages to install.

environment-rtd.yaml File

The environment-rtd.yaml file is the conda environment description file for the docs building environment on readthedocs.org. It includes only the packages above and beyond those that readthedocs.org installs into is environments as a matter of course that are required to build the docs.

The environment-rtd.yaml file for the rpn-to-gemlam tool package is absolutely minimal, specifying only the version of Python to use in the readthedocs.org environment:

# conda environment description file for docs build environment
# on readthedocs.org

name: sphinx-build

channels:
  - defaults

dependencies:
  - python=3.7

The only reason to add more packages to the dependencies list is if ImportError exceptions that arise in the Sphinx autodoc processing of docstrings can’t be resolved by the use of the autodoc_mock_imports list in conf.py.

requirements.txt File

The requirements.txt file records the full list of packages and their versions used for recent development work. It is generated using the python3 -m pip list --format=freeze command. When new package dependencies are added to the project, or the dev environment is updated via conda update --all, a new requirements.txt file should be generated and merged with the previously committed version so that the dev environment changes are tracked by Git.

tests/ Sub-directory

The tests/ sub-directory contains the unit test suite for the package. Its modules match the names of the modules in the Package Code Sub-directory, but with test_ pre-pended to them. If the Package Code Sub-directory contains sub-directories, those sub-directories are reflected in the tests/ tree.

The tests/ sub-directory, nor any other directories that may be created in its tree should not contain __init__.py files. Please see the discussion of test layout/import rules in the pytest docs for explanation.

Rationale

The changes that resulted from Doug’s August 2019 review of then current opinions and recommended practices for Python packaging are:

  • Start using the setup.cfg File in packages to contain all of the package metadata. That eliminates the __pkg_metadata__.py that was previously used for some of the metadata, and was symlinked across the Top-Level Directory and Package Code Sub-directory. It also dramatically reduces the amount of code in the setup.py (for packages that still require it), and changes how the package name and version are imported into the conf.py file in the docs/ Sub-directory.

  • Define the package version identifier in the __init__.py file in the Package Code Sub-directory.

  • Move the dev and docs environment description files in the envs/ Sub-directory.

The setup.cfg File was chosen over the pyproject.toml file because, as of pip-19.1 in the spring of 2019, “Editable” Installs are not supported for packages that contain a pyproject.toml file. Discussion by the Python Packaging Authority of how to resolve this issue is ongoing.

The src/ layout advocated by Hynek Schlawack’s Testing & Packaging blog post and Ionel Cristian Mărieș’s Packaging a python library blog post was rejected pending a strong recommendation in its favour by the Python Packaging Authority and support for it in packaging tools like the Flit packaging and publisher tool.

The benefits that src/ layout provides are not important to us because always install our group-developed packages via python3 -m pip install -e, and we don’t use tox to test our packages with different Python versions and interpreters.