Skip to main content

Package a subset of a monorepo and determine the dependent packages.

Project description

pypackagery

Build Status Coverage PyPi PyPI - Python Version Documentation Status

Pypackagery packages a subset of a monorepo and determine the dependent packages.

Given a root directory of a Python code base, a list of Python files (from that code base) and a target directory, pypackagery determines the dependent modules of the specified files. The scripts and the local dependencies are copied to the given target directory. The external dependencies (such as pypi packages) are not copied (and need not be installed), but the list containing a subset of external dependencies is generated instead.

The external dependencies of the monorepo need to be specified in <root directory>/requirements.txt and <root directory>/module_to_requirement.tsv.

The requirements.txt follows the Pip format (see pip documentation). The file defines which external packages are needed by the whole of the code base. Pypackagery will read this list and extract the subset needed by the specified Python files.

module_to_requirement.tsv defines the correspondence between Python modules and requirements as defined in requirements.txt. This correspondence needs to be manually defined since there is no way to automatically map Python modules to pip packages. The correspondance in module_to_requirement.tsv is given as lines of two tab-separated values. The first column is the full model name (such as PIL.Image) and the second is the name of the package in requirements.txt (such as pillow). The version of the package should be omitted and should be specified only in the requirements.txt.

Please do not forget to add the #egg fragment to URLs and files in requirements.txt so that the name of the package can be uniquely resolved when joining module_to_requirement.tsv and requirements.txt.

Usage

Requirement Specification

As already mentioned, the requirements are expected to follow Pip format (see pip documentation) and live in requirements.txt at the root of the code base. The mapping from modules to requirements is expected in module_to_requirement.tsv also at the root of the code base.

Assume that the code base lives in ~/workspace/some-project.

Here is an excerpt from ~/workspace/some-project/requirements.txt:

pillow==5.2.0
pytz==2018.5
pyzmq==17.1.2

And here is an excerpt from ~/workspace/some-project/module_to_requirement.tsv (mind that it’s tab separated):

PIL pillow
PIL.Image   pillow
PIL.ImageFile       pillow
PIL.ImageOps        pillow
PIL.ImageStat       pillow
PIL.ImageTk pillow
cv2 opencv-python

Directory

Assume that the code base lives in ~/workspace/some-project and we are interested to bundle everything in pipeline/out directory.

To determine the subset of the files and requirements, run the following command line:

pypackagery \
    --root_dir ~/workspace/some-project \
    --initial_set ~/workspace/some-project/pipeline/out

This gives us a verbose, human-readable output like:

External dependencies:
Package name | Requirement spec
-------------+---------------------
pyzmq        | 'pyzmq==17.1.2'
temppathlib  | 'temppathlib==1.0.3'

Local dependencies:
pipeline/out/__init__.py
common/__init__.py
common/logging.py
common/proc.py

If we want to get the same output in JSON, we need to call:

pypackagery \
    --root_dir ~/workspace/some-project \
    --initial_set ~/workspace/some-project/pipeline/out \
    --format json

which gives us a JSON-encoded dependency graph:

{
  "requirements": {
    "pyzmq": {
      "name": "pyzmq",
      "line": "pyzmq==17.1.2\n"
    },
    "temppathlib": {
      "name": "temppathlib",
      "line": "temppathlib==1.0.3\n"
    }
  },
  "rel_paths": [
    "pipeline/out/__init__.py",
    "common/__init__.py",
    "common/logging.py",
    "common/proc.py"
  ],
  "unresolved_modules": []
}

Files

Assume again that the code base lives in ~/workspace/some-project. We would like to get a subset of the code base required by a list of scripts. We need to specify the initial set as a list of files:

pypackagery \
    --root_dir ~/workspace/some-project \
    --initial_set \
        ~/workspace/some-project/pipeline/input/receivery.py \
        ~/workspace/some-project/pipeline/input/snapshotry.py

which gives us:

External dependencies:
Package name | Requirement spec
-------------+-------------------
icontract    | 'icontract==1.5.1'
pillow       | 'pillow==5.2.0'
protobuf     | 'protobuf==3.5.1'
pytz         | 'pytz==2018.5'
pyzmq        | 'pyzmq==17.1.2'
requests     | 'requests==2.19.1'

Local dependencies:
pipeline/__init__.py
pipeline/input/receivery.py
pipeline/input/snapshotry.py
common/__init__.py
common/img.py
common/logging.py
protoed/__init__.py
protoed/pipeline_pb2.py

Unresolved Modules

If there is a module which could not be resolved (neither in built-ins, nor specified in the requirements nor living in the code base), the pypackagery will return a non-zero return code.

If you specify --dont_panic, the return code will be 0 even if there are unresolved modules.

Module packagery

Pypackagery provides a module packagery which can be used to programmatically determine the dependencies of the subset of the code base. For example, this is particularly useful for deployments to a remote machine where you want to deploy only a part of the code base depending on some given configuration.

Here is an example:

import pathlib

import packagery

root_dir = pathlib.Path('/some/codebase')

rel_pths = [
    pathlib.Path("some/dir/file1.py"),
    pathlib.Path("some/other/dir/file2.py")]

requirements_txt = root_dir / "requirements.txt"
module_to_requirement_tsv = root_dir / "module_to_requirement.tsv"

requirements = packagery.parse_requirements(
    text=requirements_txt.read_text())

module_to_requirement = packagery.parse_module_to_requirement(
    text=module_to_requirement_tsv.read_text(),
    filename=module_to_requirement_tsv.as_posix())

pkg = packagery.collect_dependency_graph(
    root_dir=root_dir,
    rel_paths=rel_pths,
    requirements=requirements,
    module_to_requirement=module_to_requirement)

# do something with pkg ...

Mind that relative paths (given as rel_paths argument) all need to be files, not directories.

Documentation

The documentation is available on readthedocs.

Installation

  • Create a virtual environment:

python3 -m venv venv3
  • Activate it:

source venv3/bin/activate
  • Install pypackagery with pip:

pip3 install pypackagery

Development

  • Check out the repository.

  • In the repository root, create the virtual environment:

python3 -m venv venv3
  • Activate the virtual environment:

source venv3/bin/activate
  • Install the development dependencies:

pip3 install -e .[dev]

We use tox for testing and packaging the distribution:

tox

Pre-commit Checks

We provide a set of pre-commit checks that lint and check code for formatting.

Namely, we use:

  • yapf to check the formatting.

  • The style of the docstrings is checked with pydocstyle.

  • Static type analysis is performed with mypy.

  • Various linter checks are done with pylint.

  • Doctests are executed using the Python doctest module.

Run the pre-commit checks locally from an activated virtual environment with development dependencies:

./precommit.py
  • The pre-commit script can also automatically format the code:

./precommit.py  --overwrite

Versioning

We follow Semantic Versioning. The version X.Y.Z indicates:

  • X is the major version (backward-incompatible),

  • Y is the minor version (backward-compatible), and

  • Z is the patch version (backward-compatible bug fix).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pypackagery-1.0.2.tar.gz (13.2 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page