Package defining the Lancaster Observational Astronomy group
Project description
Lancstro: an example of creating a Python package
This repository and the following text is intended as a basic tutorial on creating and publishing a Python package. It was created for a seminar given to the Lancaster University Obervational Astrophysics group, but may be more widely applicable.
What is a Python package
In general, when talking about a Python package it means an set of Python modules and/or scipts
and/or data, that are installable under a common namespace (the package's name). A package might
also be referred to as a library. This is different than a collection of individual Python files
that you have in a folder, which will not be under a common namespace and are only accessible if
their path is in your PYTHONPATH
or you use them from the directory in which they live.
A couple of examples of common Python packages used in research in the physical sciences are:
- NumPy
- SciPy
Note: "namespace" basically refers to the name of the package as you would import it, e.g., if you import numpy with
import numpy
, then you will access all NumPy's functions/classes/modules via thenumpy
namespace:numpy.sin(2.3)
A package can contain everything within it's namespace, or contain various submodules, e.g., parts
that contain common functionality that naturally fits together in it's own namespace. For example,
in NumPy, the random
submodule
contains functions and classes for generating random numbers:
import numpy
numpy.random.randn() # generate a normally distributed random number
Why package my code?
So, why should you package your Python code rather than just having local scripts? Well, there are several reasons:
- It creates an installable package that can be imported without having to have the Python script/file in your path.
- It creates a “versioned” package that can have specified features/dependencies.
- You can share you package with others (you can make it
pip installable
via PyPI, orconda installable
via conda-forge) - You will gain developer kudos! Software development is a major skill you learn during your research, so show off what you’ve done and add it to your CV.
Project structure
To create a Python package you should structure the directory containing you code in the following way (the directory name containing this information does not have to match the package name):
repo/
├── LICENSE
├── pyproject.toml
├── README.md
├── setup.cfg
├── setup.py
├── pkgname/
│ ├── __init__.py
│ └── example.py
└── bin/
└── executable_script.py
There are other slight variations on this, for example, using a src
directory in which your
package directories live, as described in the official
guidelines).
In this project the structure is:
lancstro/
├── LICENSE
├── pyproject.toml
├── README.md
├── setup.cfg
├── setup.py
├── lancstro/
│ ├── __init__.py
│ ├── base.py.py
| ├── members/
| | ├── __init__.py
| | └── staff.py
| └── data/
| └── office_numbers.txt
└── bin/
└── favourite_object.py
Here, there is a "submodule" called members
within the main lancstro
package.
Using Github
Your package should be in a version control system and ideally hosted somewhere that provides a backup. It is now very common to use git for version control and it is sensible to host the project on Github/Gitlab/bitbucket or similar. On Github you can have public or private repositories.
If using Github, it is best to start the project by creating new repository there first and then cloning that repository to you machine before then adding in your code. When creating a Github repository (I might use "repo" for short later) you can initialise it with a license file and a README file.
Note: this is not a tutorial on using git, so you'll have to find that elsewhere.
The LICENSE file
You should give your code a license describing the terms of use and copyright. Often you'll want your code to be open source, so a good choice is the MIT license, which is very permissive in terms of reuse of your code. A variety of other open source licenses are available, although these often differ slighty on the permissiveness, i.e., whether others can use your code in commercial and non-open source projects or not.
The LICENSE
file will contain a plain ascii text copy of your license.
The pyproject.toml file
This file tells the pip
tool used for installing
packages how it should build the package. In this repo we have used the file
contents suggested
here, which
means that the setuptools
package is used for
the build.
The README.md file
This is the file that you are currently reading! It should provide a basic description of your package, maybe including information about how to install it. Ideally it should be brief and not be seen as a replacement for having proper documentation for you code available elsewhere.
In this case the suggested format for the file is
Markdown (the .md
extension), but it could be a
plain ascii text file for reStructedText. Markdown and
reStructuredText will be automatically rendered if you host your package on, e.g.,
Github.
The setup.cfg and setup.py files
In many packages you might just see a setup.py
file, which is the build script used by setuptools.
However, it is now good practice to put "static"
metadata about
your package in the setup.cfg
configuration
file. By
"static" I mean any package information that does not have to be dynamically defined during the
build process (such as defining and building Cython
extensions). In many cases, like
this repository, this can mean the setup.py
file can be very simple and just contain:
from setuptools import setup
setup()
Note: if using pip version greater than 19 for installing code, and/or if you're package contains a
pyproject.toml
file that specifiessetuptools>=40.9.0
, you don't actually need thesetup.py
and you can just use thesetup.cfg
file.
The layout of the configuration file is described here, and I'll reproduce the one from this project below with additional inline comments:
[metadata]
# the name of the package
name = lancstro
# the package author information (multiple authors can just be separated by commas)
author = Matthew Pitkin
author_email = m.pitkin@lancaster.ac.uk
# a brief description of the package
description = Package defining the Lancaster Observational Astronomy group
# the license type and license file
license = MIT
license_files = LICENSE
# a more in-depth description of the project that will appear on it's PyPI page,
# in this case read in from the README.md file
long_description = file: README.md
long_description_content_type = text/markdown
# the projects URL (often the Github repo URL)
url = https://github.com/mattpitkin/lancstro
# standard classifiers giving some information about the project
classifiers =
Intended Audience :: Science/Research
License :: OSI Approved :: MIT License
Natural Language :: English
Programming Language :: Python
Programming Language :: Python :: 3
Programming Language :: Python :: 3.6
Programming Language :: Python :: 3.7
Programming Language :: Python :: 3.8
Programming Language :: Python :: 3.9
Topic :: Scientific/Engineering
Topic :: Scientific/Engineering :: Astronomy
Topic :: Scientific/Engineering :: Physics
# the package's current version (this isn't actually in the file, see later!)
version = 0.0.1
[options]
# state the Python versions that the package requires/supports
python_requires = >=3.6
# state packages and versions (of necessary) required for running the setup
setup_requires =
setuptools >= 43
wheel
# state packages and versions (if necessary) required for installing and using the package
install_requires =
astropy
astroquery >= 0.4.3
# automatically find all modules within this package
packages = find:
# include data in the package defined below
include_package_data = True
# any executable scripts to include in the package
scripts =
bin/favourite_object.py
[options.package_data]
# any data files to include in the package (lancsrto shows they are in the
# lancstro packge and then the paths are given)
lancstro =
data/office_numbers.txt
For a list of the standard "classifiers" that you can add see here.
In this project we have added some "data" files that come bundled with the package. It is not required to include data in your package.
Adding a package version
In the above case the package version is set manually in the setup.cfg
file. It is up to you how
you define the version string, but it is often good to used Semantic
Versioning. In this format the version consists of three full-stop separated
numbers: MAJOR.MINOR.PATCH.
The Semantic Versioning site gives the following definitions of when to change the numbers:
- MAJOR version when you make incompatible API changes,
- MINOR version when you add functionality in a backwards compatible manner, and
- PATCH version when you make backwards compatible bug fixes.
Additional labels for pre-release and build metadata are available as extensions to the MAJOR.MINOR.PATCH format.
To update the version you can just edit the value in the setup.cfg
file. When you
install this will be the packages version.
This allows the package manager (e.g., pip) to know what version of the package is installed.
However, it is often useful to provide the version number as a variable within the package itself,
so that the user can check it if necessary. Most often you will find this as a variable called
__version__
, e.g.,:
import numpy
print(numpy.__version__)
1.21.2
There are several ways to set this, but it is best to make sure that there's only one place that you
have to edit the version number rather than multiple places. One method (used in this package) is to
include the version number in your packages main __init__.py
file by adding the line:
__version__ = "0.0.1"
Then, within setup.cfg
, the version
line can be:
version = attr: lancstro.__version__
Among the other options, a good one to use is through setting the version with a tools such as
setuptools-scm
, which gathers the version information
from git tags in your repo.
The MANIFEST.in file
You can specify which additional files that you want to be bundled with the package's source
distribution using a MANIFEST.in
. With
modern versions of setuptools (e.g., greater than 43) most of the standard file such as the README
file and setup files, and any license file given in setup.cfg
, are automatically included in the
source distribution by default.
However, you may want to include other files. If you had, say, a test
directory with multiple
Python test scripts that you want in the package, you could add and MANIFEST.in
file containing:
recursive-include test/ *.py
which will include all .py
file within test
.
Installing the package
Once you have the above structure you can install the package (from it's base directory) using:
pip install .
That's it! Open up a Python terminal and you should be able to do:
import lancstro
print(lancstro.__version__)
0.0.1
or run the favourite_object.py
script from the command line:
$ favourite_object.py -h
usage: favourite_object.py [-h] name name
Get a staff member's favourite object
positional arguments:
name The staff member's full name
optional arguments:
-h, --help show this help message and exit
Documentation
Not covered here!
There are many additional useful things that I've not covered here. These include:
- using extry point console scripts rather than, or as well as, including executable scripts
- including C/C++/FORTRAN code, or Cython-ized code, in your package
- creating a test suite for your package (and checking its coverage)
- setting up continuous integration for building and testing (and automatically publishing) your code (e.g., with Github Actions, TravisCI, ...)
I may add these at a later date.
Other resources
For other descriptions of creating your Python code see:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.