Skip to main content

Package defining the Lancaster Observational Astronomy group

Project description

Lancstro: an example of creating a Python package

This repository and the following text is intended as a basic tutorial on creating and publishing a Python package. It was created for a seminar given to the Lancaster University Obervational Astrophysics group, but may be more widely applicable.

What is a Python package

In general, when talking about a Python package it means an set of Python modules and/or scipts and/or data, that are installable under a common namespace (the package's name). A package might also be referred to as a library. This is different than a collection of individual Python files that you have in a folder, which will not be under a common namespace and are only accessible if their path is in your PYTHONPATH or you use them from the directory in which they live.

A couple of examples of common Python packages used in research in the physical sciences are:

  1. NumPy
  2. SciPy

Note: "namespace" basically refers to the name of the package as you would import it, e.g., if you import numpy with import numpy, then you will access all NumPy's functions/classes/modules via the numpy namespace:

numpy.sin(2.3)

A package can contain everything within it's namespace, or contain various submodules, e.g., parts that contain common functionality that naturally fits together in it's own namespace. For example, in NumPy, the random submodule contains functions and classes for generating random numbers:

import numpy
numpy.random.randn()  # generate a normally distributed random number

Why package my code?

So, why should you package your Python code rather than just having local scripts? Well, there are several reasons:

  • It creates an installable package that can be imported without having to have the Python script/file in your path.
  • It creates a “versioned” package that can have specified features/dependencies.
  • You can share you package with others (you can make it pip installable via PyPI, or conda installable via conda-forge)
  • You will gain developer kudos! Software development is a major skill you learn during your research, so show off what you’ve done and add it to your CV.

Project structure

To create a Python package you should structure the directory containing you code in the following way (the directory name containing this information does not have to match the package name):

repo/
├── LICENSE
├── pyproject.toml
├── README.md
├── setup.cfg
├── setup.py
├── pkgname/
│   ├── __init__.py
│   └── example.py
└── bin/
    └── executable_script.py

There are other slight variations on this, for example, using a src directory in which your package directories live, as described in the official guidelines).

In this project the structure is:

lancstro/
├── LICENSE
├── pyproject.toml
├── README.md
├── setup.cfg
├── setup.py
├── lancstro/
│   ├── __init__.py
│   ├── base.py.py
|   ├── members/
|   |   ├── __init__.py
|   |   └── staff.py
|   └── data/
|       └── office_numbers.txt
└── bin/
    └── favourite_object.py

Here, there is a "submodule" called members within the main lancstro package.

Using Github

Your package should be in a version control system and ideally hosted somewhere that provides a backup. It is now very common to use git for version control and it is sensible to host the project on Github/Gitlab/bitbucket or similar. On Github you can have public or private repositories.

If using Github, it is best to start the project by creating new repository there first and then cloning that repository to you machine before then adding in your code. When creating a Github repository (I might use "repo" for short later) you can initialise it with a license file and a README file.

Note: this is not a tutorial on using git, so you'll have to find that elsewhere.

The LICENSE file

You should give your code a license describing the terms of use and copyright. Often you'll want your code to be open source, so a good choice is the MIT license, which is very permissive in terms of reuse of your code. A variety of other open source licenses are available, although these often differ slighty on the permissiveness, i.e., whether others can use your code in commercial and non-open source projects or not.

The LICENSE file will contain a plain ascii text copy of your license.

The pyproject.toml file

This file tells the pip tool used for installing packages how it should build the package. In this repo we have used the file contents suggested here, which means that the setuptools package is used for the build.

The README.md file

This is the file that you are currently reading! It should provide a basic description of your package, maybe including information about how to install it. Ideally it should be brief and not be seen as a replacement for having proper documentation for you code available elsewhere.

In this case the suggested format for the file is Markdown (the .md extension), but it could be a plain ascii text file for reStructedText. Markdown and reStructuredText will be automatically rendered if you host your package on, e.g., Github.

The setup.cfg and setup.py files

In many packages you might just see a setup.py file, which is the build script used by setuptools. However, it is now good practice to put "static" metadata about your package in the setup.cfg configuration file. By "static" I mean any package information that does not have to be dynamically defined during the build process (such as defining and building Cython extensions). In many cases, like this repository, this can mean the setup.py file can be very simple and just contain:

from setuptools import setup

setup()

Note: if using pip version greater than 19 for installing code, and/or if you're package contains a pyproject.toml file that specifies setuptools>=40.9.0, you don't actually need the setup.py and you can just use the setup.cfg file.

The layout of the configuration file is described here, and I'll reproduce the one from this project below with additional inline comments:

[metadata]
# the name of the package
name = lancstro

# the package author information (multiple authors can just be separated by commas)
author = Matthew Pitkin
author_email = m.pitkin@lancaster.ac.uk

# a brief description of the package
description = Package defining the Lancaster Observational Astronomy group

# the license type and license file
license = MIT
license_files = LICENSE

# a more in-depth description of the project that will appear on it's PyPI page,
# in this case read in from the README.md file
long_description = file: README.md
long_description_content_type = text/markdown

# the projects URL (often the Github repo URL)
url = https://github.com/mattpitkin/lancstro

# standard classifiers giving some information about the project
classifiers =
    Intended Audience :: Science/Research
    License :: OSI Approved :: MIT License
    Natural Language :: English
    Programming Language :: Python
    Programming Language :: Python :: 3
    Programming Language :: Python :: 3.6
    Programming Language :: Python :: 3.7
    Programming Language :: Python :: 3.8
    Programming Language :: Python :: 3.9
    Topic :: Scientific/Engineering
    Topic :: Scientific/Engineering :: Astronomy
    Topic :: Scientific/Engineering :: Physics

# the package's current version (this isn't actually in the file, see later!)
version = 0.0.1

[options]
# state the Python versions that the package requires/supports
python_requires = >=3.6

# state packages and versions (of necessary) required for running the setup
setup_requires =
    setuptools >= 43
    wheel

# state packages and versions (if necessary) required for installing and using the package
install_requires =
    astropy
    astroquery >= 0.4.3

# automatically find all modules within this package
packages = find:

# include data in the package defined below
include_package_data = True

# any executable scripts to include in the package
scripts =
    bin/favourite_object.py

[options.package_data]
# any data files to include in the package (lancsrto shows they are in the
# lancstro packge and then the paths are given)
lancstro = 
    data/office_numbers.txt

For a list of the standard "classifiers" that you can add see here.

In this project we have added some "data" files that come bundled with the package. It is not required to include data in your package.

Adding a package version

In the above case the package version is set manually in the setup.cfg file. It is up to you how you define the version string, but it is often good to used Semantic Versioning. In this format the version consists of three full-stop separated numbers: MAJOR.MINOR.PATCH.

The Semantic Versioning site gives the following definitions of when to change the numbers:

  1. MAJOR version when you make incompatible API changes,
  2. MINOR version when you add functionality in a backwards compatible manner, and
  3. PATCH version when you make backwards compatible bug fixes.

Additional labels for pre-release and build metadata are available as extensions to the MAJOR.MINOR.PATCH format.

To update the version you can just edit the value in the setup.cfg file. When you install this will be the packages version.

This allows the package manager (e.g., pip) to know what version of the package is installed. However, it is often useful to provide the version number as a variable within the package itself, so that the user can check it if necessary. Most often you will find this as a variable called __version__, e.g.,:

import numpy
print(numpy.__version__)
1.21.2

There are several ways to set this, but it is best to make sure that there's only one place that you have to edit the version number rather than multiple places. One method (used in this package) is to include the version number in your packages main __init__.py file by adding the line:

__version__ = "0.0.1"

Then, within setup.cfg, the version line can be:

version = attr: lancstro.__version__

Among the other options, a good one to use is through setting the version with a tools such as setuptools-scm, which gathers the version information from git tags in your repo.

The MANIFEST.in file

You can specify which additional files that you want to be bundled with the package's source distribution using a MANIFEST.in. With modern versions of setuptools (e.g., greater than 43) most of the standard file such as the README file and setup files, and any license file given in setup.cfg, are automatically included in the source distribution by default.

However, you may want to include other files. If you had, say, a test directory with multiple Python test scripts that you want in the package, you could add and MANIFEST.in file containing:

recursive-include test/ *.py

which will include all .py file within test.

Installing the package

Once you have the above structure you can install the package (from it's base directory) using:

pip install .

That's it! Open up a Python terminal and you should be able to do:

import lancstro
print(lancstro.__version__)
0.0.1

or run the favourite_object.py script from the command line:

$ favourite_object.py -h
usage: favourite_object.py [-h] name name

Get a staff member's favourite object

positional arguments:
  name        The staff member's full name

optional arguments:
  -h, --help  show this help message and exit

Documentation

Not covered here!

There are many additional useful things that I've not covered here. These include:

  • using extry point console scripts rather than, or as well as, including executable scripts
  • including C/C++/FORTRAN code, or Cython-ized code, in your package
  • creating a test suite for your package (and checking its coverage)
  • setting up continuous integration for building and testing (and automatically publishing) your code (e.g., with Github Actions, TravisCI, ...)

I may add these at a later date.

Other resources

For other descriptions of creating your Python code see:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

lancstro-0.0.1-py3-none-any.whl (10.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page