Skip to main content

Collection of scripts and tools for MGnify pipelines

Project description

mgnify-pipelines-toolkit

This Python package contains a collection of scripts and tools for including in MGnify pipelines. Scripts stored here are mainly for:

  • One-off production scripts that perform specific tasks in pipelines
  • Scripts that have few dependencies
  • Scripts that don't have existing containers built to run them
  • Scripts for which building an entire container would be too bulky of a solution to deploy in pipelines

This package is built and uploaded to PyPi and bioconda. The package bundles scripts and makes them executable from the command-line when this package is installed.

How to install

This package is available both on PyPi and bioconda.

To install from PyPi with pip:

pip install mgnify-pipelines-toolkit

To install from bioconda with conda/mamba:

conda install -c bioconda mgnify-pipelines-toolkit

You should then be able to run the packages from the command-line. For example to run the get_subunits.py script:

get_subunits -i ${easel_coords} -n ${meta.id}

Development

Quick Start with uv and Taskfile

This project uses uv for fast Python environment management and Task for task automation.

Prerequisites:

Common tasks:

task: Available tasks for this project:
* clean:            Clean up generated files and caches
* lint:             Run linters (ruff check only)
* lint-fix:         Run linters and fix issues automatically
* pre-commit:       Install pre-commit hooks
* run:              Run toolkit scripts with uv (usage: task run -- <script_name> [args])
* test:             Run tests with uv
* testk:            Run specific tests from a file (usage: task testk -- test_path)
* venv:             Create a virtual environment with uv

When doing these steps above, you ensure that the code you add will be linted and formatted properly.

New script requirements

There are a few requirements for your script:

  • It needs to have a named main function of some kind. See mgnify_pipelines_toolkit/analysis/shared/get_subunits.py and the main() function for an example
  • Because this package is meant to be run from the command-line, make sure your script can easily pass arguments using tools like argparse or click
  • A small amount of dependencies. This requirement is subjective, but for example if your script only requires a handful of basic packages like Biopython, numpy, pandas, etc., then it's fine. However if the script has a more extensive list of dependencies, a container is probably a better fit.

How to add a new script

To add a new Python script, first copy it over to the mgnify_pipelines_toolkit directory in this repository, specifically to the subdirectory that makes the most sense. If none of the subdirectories make sense for your script, create a new one. If your script doesn't have a main() type function yet, write one.

Then, open pyproject.toml as you will need to add some bits. First, add any missing dependencies (include the version) to the dependencies field.

Then, if you created a new subdirectory to add your script in, go to the packages line under [tool.setuptools] and add the new subdirectory following the same syntax.

Then, scroll down to the [project.scripts] line. Here, you will create an alias command for running your script from the command-line. In the example line:

get_subunits = "mgnify_pipelines_toolkit.analysis.shared.get_subunits:main"

  • get_subunits is the alias
  • mgnify_pipelines_toolkit.analysis.shared.get_subunits will link the alias to the script with the path mgnify_pipelines_toolkit/analysis/shared/get_subunits.py
  • :main will specifically call the function named main() when the alias is run.

When you have setup this command, executing get_subunits on the command-line will be the equivalent of doing:

from mgnify_pipelines_toolkit.analysis.shared.get_subunits import main; main()

You should then write at least one unit test for your addition. This package uses pytest at the moment for this purpose. A GitHub Action workflow will run all of the unit tests whenever a commit is pushed to any branch.

Finally, you will need to bump up the version in the version line.

At the moment, these should be the only steps required to setup your script in this package (which is subject to change).

Building and uploading to PyPi

The building and pushing of the package is automated by GitHub Actions, which will activate only on a new release. Bioconda should then automatically pick up the new PyPi release and push it to their recipes, though it's worth keeping an eye on their automated pull requests just in case here.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mgnify_pipelines_toolkit-1.5.1.tar.gz (118.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mgnify_pipelines_toolkit-1.5.1-py3-none-any.whl (162.4 kB view details)

Uploaded Python 3

File details

Details for the file mgnify_pipelines_toolkit-1.5.1.tar.gz.

File metadata

  • Download URL: mgnify_pipelines_toolkit-1.5.1.tar.gz
  • Upload date:
  • Size: 118.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mgnify_pipelines_toolkit-1.5.1.tar.gz
Algorithm Hash digest
SHA256 a213e983de2ba2c42fdd0f6c668e64cc103dcbee9ac2b3063e89bf0353b480fc
MD5 6214d65cd3f7a9a689ca3d2eb0f71c4f
BLAKE2b-256 cb29635d691d40bba69103ebcb612ec79a389acce63b376279b940033f89a8bc

See more details on using hashes here.

Provenance

The following attestation bundles were made for mgnify_pipelines_toolkit-1.5.1.tar.gz:

Publisher: python-publish.yml on EBI-Metagenomics/mgnify-pipelines-toolkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mgnify_pipelines_toolkit-1.5.1-py3-none-any.whl.

File metadata

File hashes

Hashes for mgnify_pipelines_toolkit-1.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b0a9f0cd51171fa5580f639fca1e055c36fd774e75196ea0a260b5dc423ec00d
MD5 ce3932c4f03097e66b153422c9faaf81
BLAKE2b-256 dfa868a57914d8ceb21de3dffe393687b7d13a1055736128d3d98830b15c86a5

See more details on using hashes here.

Provenance

The following attestation bundles were made for mgnify_pipelines_toolkit-1.5.1-py3-none-any.whl:

Publisher: python-publish.yml on EBI-Metagenomics/mgnify-pipelines-toolkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page