How to Structure Python Packages and Why
In August 2019 Doug reviewed current opinions and recommended practices for Python packaging and arrived at the recommendations below for the directories and files layout of Python packages developed by the UBC EOAS MOAD group. The rpn-to-gemlam tool package was the first package that was converted from the previously used package layout to what is recommended here, so its files are used as examples below.
Doug periodically reviews the Python packaging landscape and updates this section to keep pace with changes in the packaging tools, opinions, and recommended practices.
References
Ionel Cristian Mărieș’s Packaging a python library blog post
Brian Skinn’s My How and Why pyproject.toml & the ‘src’ Project Structure blog post
Brett Canon’s Clarifying PEP 518 (a.k.a. pyproject.toml) blog post
Note
There are 3 key things that people who are familiar with Python packaging need to understand about how the UBC EOAS MOAD group uses its internally developed packages:
We rely heavily on “Editable” Installs. That is, we install our packages from Git clones of the repositories using the python3 -m pip install --editable command. That makes the workflow for getting updates into our installed packages a simple git pull in the package repository clone directory.
On our local workstations and laptops we work in conda environments, either the
base
environment created by Installing Miniforge, or project-specific environments.On HPC clusters we use the system-provided Python 3 module and install our packages using the “user scheme” for installation in combination with “Editable” Installs, that is, python3 -m pip install --user -e.
Package Layout
Using the rpn-to-gemlam tool package as an example, the directories and files layout of a MOAD package looks like:
rpn-to-gemlam/
├── docs/
│ ├── conf.py
│ ├── index.rst
│ ├── Makefile
│ ├── ...
│ └── _static/
│ ├── ...
├── envs/
│ ├── environment-dev.yaml
│ ├── environment-rtd.yaml
| ├── requirements.txt
├── .readthedocs.yaml
├── LICENSE
├── pyproject.toml
├── README.rst
├── rpn_to_gemlam/
│ ├── __init__.py
│ ├── ...
├── setup.cfg
└── tests/
├── ...
In summary (please see the sections below for detail explanations):
the Top-Level Directory is a clone of the package’s Git repository.
It typically contains 5 files and 4 sub-directories.
The 5 files are:
pyproject.toml File that contains the build system requirements and build backend tools to use for creation of the package
setup.cfg File that contains the package metadata and
setuptools
options for creation of the packageREADME.rst File that provides the long description of the package
LICENSE File that contains the legal text of the Apache License, Version 2.0 license for the package
.readthedocs.yaml File that provides configuration for building the docs to the https://readthedocs.org service
The 4 sub-directories are:
Package Code Sub-directory that contains the code modules
docs/ Sub-directory that contains the Sphinx source files for the package documentation
envs/ Sub-directory that contains the conda environments description YAML files for the package development and docs building environments,
tests/ Sub-directory that contains the unit test suite for the package
The __init__.py
file in the Package Code Sub-directory provides the package version identifier string as a variable named __version__
.
Top-Level Directory
The name of the top-level directory is the “project name”.
It does not have to be the same as the “package name” that you use in import
statements.
In this example the “project name” is rpn-to-gemlam
,
and the “package name” is rpn_to_gemlam
.
Other examples of MOAD project and package names are:
the
moad_tools
package is namedmoad_tools
the
SalishSeaTools
package is namedsalishsea_tools
the
SalishSeaNowcast
package is namednowcast
The top-level directory “project name” is generally the name of the project’s Git repository, however, keep in mind that Bitbucket converts repository names to all-lowercase.
Package Files
The top-level directory must contain 4 files that contain the information necessary to create a Python package. It also contains a file to tell https://readthedocs.org/ how to configure an environment in which to build the package documentation.
pyproject.toml
File
The pyproject.toml
file contains the build system requirements and build backend tools to use for creation of the package.
It is documented at https://setuptools.pypa.io/en/latest/build_meta.html.
We use setuptools
as our build backend,
so our pyproject.toml
files always look like:
[build-system]
requires = ["setuptools", "wheel"]
build-backend = "setuptools.build_meta"
Warning
Editable installs of packages that contain a pyproject.toml
file are not supported by pip<21.3
(released 11-Oct-2021).
At time of writing
(29-Oct-2021)
the Compute Canada clusters are using pip=20.0.2.
Until they upgrade to pip>=21.3 any of our packages that need to be installable on one of those clusters cannot contain a pyproject.toml
file.
In place of the pyproject.toml
file,
packages that have to be installed on Compute Canada clusters require a setup.py
file containing:
import setuptools
setuptools.setup()
setup.cfg
File
The setup.cfg
file contains the package metadata and setuptools
options for creation of the package.
It is documented at https://setuptools.pypa.io/en/latest/userguide/declarative_config.html.
A minimal setup.cfg
file looks like:
[metadata]
name = project_name
version = 1.0
description = One line description of the package
author = your name
auhor_email = your email address
[options]
zip_safe = False
include_package_date = True
packages = find:
install_requires =
list of packages that the package depends on, one per line
The setup.cfg
file for the rpn-to-gemlam tool package
(with the copyright header comment block excluded)
looks like:
[metadata]
name = rpn-to-gemlam
version = attr: rpn_to_gemlam.__version__
description = ECCC RPN to SalishSeaCast NEMO Atmospheric Forcing Conversion Tool
author = Doug Latornell
author_email = dlatornell@eoas.ubc.ca
url=https://github.com/SalishSeaCast/rpn-to-gemlam
long_description = file: README.rst
license = Apache License, Version 2.0
platform = Linux
classifiers =
Development Status :: 3 - Alpha
License :: OSI Approved :: Apache Software License
Programming Language :: Python :: Implementation :: CPython
Programming Language :: Python :: 3
Programming Language :: Python :: 3.6
Programming Language :: Python :: 3.7
Operating System :: POSIX :: Linux
Operating System :: Unix
Environment :: Console
Intended Audience :: Science/Research
Intended Audience :: Education
[options]
zip_safe = False
include_package_data = True
packages = find:
python_requires = >=3.6
install_requires =
# see envs/environment-dev.yaml for conda environment dev installation
# see envs/requirements.txt for versions most recently used in development
angles
arrow
bottleneck
Click
matplotlib
netCDF4
python-dateutil
pytz
requests
retrying
scipy
xarray
# python3 -m pip install --editable ../tools/SalishSeaTools
[options.entry_points]
console_scripts =
rpn-to-gemlam = rpn_to_gemlam.rpn_to_gemlam:cli
The [options.entry_points]
stanza is an example of the declaration of entry points.
They are used in packages that use a framework like Click or Cliff to provide a command-line interface.
Note
Declaration of entry points in setup.cfg
is supported by setuptools>=51.0.0
released on 6-Dec-2020.
Any newly created conda environment will include a version of setuptools
much newer than that.
The same is true of the Compute Canada StdEnv/2020
environment.
The only platform where we can’t support this feature is orcinus
.
README.rst
File
The README.rst
file provides a more than one line description of the package.
Take a look some of the UBC EOAS MOAD repositories to get an idea of typical contents.
README.rst
should include a copyright and license section.
The README.rst
file is included as the long_description
metadata value in the setup.cfg File by including the line:
long_description = file: README.rst
in the [metadata]
section.
README
files written using reStructuredText
(or Markdown)
are automatically rendered to HTML in Bitbucket web pages.
LICENSE
File
The LICENSE
contains the legal license text for the package.
We release all of our open code under the Apache License, Version 2.0
So,
you can just copy the LICENSE
file from another MOAD repository.
Be sure to include the license declaration via the license
metadata value in the setup.cfg File by including the line:
license = Apache License, Version 2.0
in the [metadata]
section.
.readthedocs.yaml
File
For packages that use https://readthedocs.org/ to render and host their documentation,
we include a .readthedocs.yaml
file in the top-level directory
(the file name and location are stipulated by readthedocs).
That file declares the features of the environment that we want readthedocs to use to build our docs,
specifically,
a conda environment that we describe in the envs/environment-rtd.yaml
file
(described below),
and the most recent version of Python.
The .readthedocs.yaml
file for the rpn-to-gemlam tool package is typical,
and looks like:
version: 2
build:
os: ubuntu-20.04
tools:
python: "mambaforge-4.10"
conda:
environment: envs/environment-rtd.yaml
# Only build HTML and JSON formats
formats: []
Package Sub-Directories
The top-level directory must contain a package sub-directory in which the Python modules that are the package code are stored. There are also usually 3 other sub-directories that contain:
the package documentation (
docs/
)descriptions of the conda environments used for development of the package and building its documentation (
envs/
)the unit test suite for the package (
tests/
)
Package Code Sub-directory
The package code sub-directory is where the Python modules that are the package code are stored.
Its name is the package name that is used in import
statements.
In the the rpn-to-gemlam tool package the package sub-directory is named rpn_to_gemlam
.
Because the package name is used in import
statements it must follow the rules that Python imposes on module names:
contain only letters, numbers, and underscores
not start with a number
By convention, package names are all-lowercase, and use underscores when they improve readability. A leading underscore is the convention that indicates a private module, variable, etc., so a package name that starts with an underscore would be unusual and confusing.
The package sub-directory must contain a file called __init__.py
(often pronounced “dunder init”).
The presence of a __init__.py
file is what makes a directory and the Python modules it contains importable.
In MOAD packages the __init__.py
file in the package sub-directory contains a declaration of a variable named __version__
,
for example:
__version__ = "19.1.dev0"
We use a CalVer versioning scheme that conforms to PEP-440.
The version identifier format is yy.n[.devn]
,
where yy
is the (post-2000) year of release,
and n
is the number of the release within the year, starting at 1
.
After a release has been made the value of n
is incremented by 1,
and .dev0
is appended to the version identifier to indicate changes that will be included in the next release.
The __version__
value is included as the version
metadata value in the setup.cfg File by including the line:
version = attr: package_name.__version__
in the [metadata]
section.
Be sure to replace package_name
with the package name you chose for the Package Code Sub-directory.
docs/
Sub-directory
The docs/
directory contains the Sphinx source files for the package documentation.
This directory is initialized by creating it,
then running the sphinx-quickstart command in it.
After initializing the docs/
directory,
its conf.py
file requires some editing.
Please see docs/conf.py
in the rpn-to-gemlam tool package for an example of a “finished” file.
The key things that need to be done are:
Add:
import os import sys sys.path.insert(0, os.path.abspath(".."))
to the
# -- Path setup ----------
section of the file to make the package code directory tree available to the Sphinx builder for collection of package metadata, automatic generation of documentation from docstrings, etc.Change the
project
code in the# -- Project information ---------
section to:import configparser setup_cfg = configparser.ConfigParser() setup_cfg.read(os.path.abspath("../setup.cfg")) project = setup_cfg["metadata"]["name"]
to get the project name from the
metadata
section of the setup.cfg File.Change the
copyright
code in the# -- Project information ---------
section to something like:import datetime pkg_creation_year = 2019 copyright_years = ( f"{pkg_creation_year}" if datetime.date.today().year == pkg_creation_year else f"{pkg_creation_year}-{datetime.date.today():%Y}" ) copyright = f"{copyright_years}, {author}"
to ensure that the copyright year range displayed in the rendered docs is always up to date (at least as of the most recent rendering).
Change the
version
andrelease
code in the# -- Project information ---------
section to something like:import package_name version = package_name.__version__ release = version
to get the package version identifier from the
__version__
variable in the package__init__.py
file. Be sure to replacepackage_name
with the package name you chose for the Package Code Sub-directory.
envs/
Sub-directory
The envs/
sub-directory contains at least 3 files that describe the conda environments for the package development and docs building environments.
environment-dev.yaml
File
The environment-dev.yaml
file is the conda environment description file for the package development environment.
It includes all of the packages necessary to install,
run,
develop,
test,
and document the package.
For example,
the environment-dev.yaml
file for the rpn-to-gemlam tool package looks like:
# conda environment description file for rpn-to-gemlam package
# development environment
#
# Create a conda environment for development, testing and documentation of the package
# with:
#
# $ conda env create -f rpn-to-gemlam/environment-dev.yaml
# $ conda activate rpn-to-gemlam
# (rpn-to-gemlam)$ python3 -m pip install --editable ../tools/SalishSeaTools
# (rpn-to-gemlam)$ python3 -m pip install --editable rpn-to-gemlam
#
# The environment will include all of the tools used to develop,
# test, and document the rpn-to-gemlam package.
#
# See the envs/requirements.txt file for an exhaustive list of all of the
# packages installed in the environment and their versions used in
# recent development.
name: rpn-to-gemlam
channels:
- conda-forge
- nodefaults
dependencies:
- arrow
- bottleneck
- Click
- matplotlib
- netCDF4
- pip
- python=3.7
- python-dateutil
- pytz
- requests
- retrying
- scipy
- xarray
# For unit tests
- coverage
- pytest
# For documentation
- sphinx
- sphinx_rtd_theme
# For coding style
- black
- pip:
- angles
The comments at the top of the file include a succinct version of the commands required to create the dev environment.
The recommended conda channel to get packages from is
conda-forge
.nodefaults
is included in thechannels
list to speed up the packages dependency solver because it is now rare for us to require packages from any other source thanconda-forge
.Packages that are unavailable from conda channels are installed via pip.
The environment-dev.yaml
file is “hand-crafted” rather than being generated via the conda env export command.
As such,
it contains only the top level dependency packages,
and only version specifications that are absolutely necessary.
That allows the conda solver do its job to assemble a consistent set of up-to-date packages to install.
environment-rtd.yaml
File
The environment-rtd.yaml
file is the conda environment description file for the docs building environment on readthedocs.org.
It includes only the packages above and beyond those that readthedocs.org installs into is environments as a matter of course that are required to build the docs.
The environment-rtd.yaml
file for the rpn-to-gemlam tool package is absolutely minimal,
specifying only the version of Python to use in the readthedocs.org environment:
# conda environment description file for docs build environment
# on readthedocs.org
name: sphinx-build
channels:
- defaults
dependencies:
- python=3.7
The only reason to add more packages to the dependencies
list is if ImportError
exceptions that arise in the Sphinx autodoc processing of docstrings can’t be resolved by the use of the autodoc_mock_imports list in conf.py
.
requirements.txt
File
The requirements.txt
file records the full list of packages and their versions used for recent development work.
It is generated using the python3 -m pip list --format=freeze command.
When new package dependencies are added to the project,
or the dev environment is updated via conda update --all,
a new requirements.txt
file should be generated and merged with the previously committed version so that the dev environment changes are tracked by Git.
tests/
Sub-directory
The tests/
sub-directory contains the unit test suite for the package.
Its modules match the names of the modules in the Package Code Sub-directory,
but with test_
pre-pended to them.
If the Package Code Sub-directory contains sub-directories,
those sub-directories are reflected in the tests/
tree.
The tests/
sub-directory,
nor any other directories that may be created in its tree should not contain __init__.py
files.
Please see the discussion of test layout/import rules in the pytest docs for explanation.
Rationale
The changes that resulted from Doug’s August 2019 review of then current opinions and recommended practices for Python packaging are:
Start using the setup.cfg File in packages to contain all of the package metadata. That eliminates the
__pkg_metadata__.py
that was previously used for some of the metadata, and was symlinked across the Top-Level Directory and Package Code Sub-directory. It also dramatically reduces the amount of code in thesetup.py
(for packages that still require it), and changes how the package name and version are imported into theconf.py
file in the docs/ Sub-directory.Define the package version identifier in the
__init__.py
file in the Package Code Sub-directory.Move the dev and docs environment description files in the envs/ Sub-directory.
The setup.cfg File was chosen over the pyproject.toml file because,
as of pip-19.1
in the spring of 2019,
“Editable” Installs are not supported for packages that contain a pyproject.toml
file.
Discussion by the Python Packaging Authority of how to resolve this issue is ongoing.
The src/
layout advocated by Hynek Schlawack’s Testing & Packaging blog post and Ionel Cristian Mărieș’s Packaging a python library blog post was rejected pending a strong recommendation in its favour by the Python Packaging Authority and support for it in packaging tools like the Flit packaging and publisher tool.
The benefits that src/
layout provides are not important to us because always install our group-developed packages via python3 -m pip install -e,
and we don’t use tox to test our packages with different Python versions and interpreters.