Skip to main content

Shell command library and Snakemake wrappers for Sequana pipelines

Project description

The Sequana Wrapper Repository

Tests wrappers Tests shells Tests snippets Tests

Overview Shell command library and Snakemake wrappers for Sequana pipelines
Status Production (wrappers/ — maintenance only) / Active (shells/, snippets/)
Issues Please fill a report on github/sequana/sequana-wrappers
Python version Python 3.8+
Citation Cokelaer et al, (2017), ‘Sequana’: a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, doi:10.21105/joss.00352

Status and roadmap

This repository contains two independent mechanisms for providing bioinformatics tool commands to Sequana pipelines:

  • wrappers/ — the original Snakemake wrapper system (Python scripts + conda environment.yaml). This tree is now in maintenance mode. No new wrappers will be added. Bug fixes will still be accepted, but all new development happens in shells/. See the rationale below for the full explanation of why.

  • sequana_wrappers/shells/ — the new shell command library. Versioned shell strings that work with container: + shell: rules, with no Python inside the container. This is the active development track and the recommended approach for all new Sequana pipelines.

  • sequana_wrappers/snippets/ — versioned Python callables for pipeline steps that require Python logic but still benefit from shared, versioned definitions. Used via run: blocks (not shell:). See the snippets section below for details.

    All wrappers available in shells except 3 (require Python imports from the sequana library — not expressible as pure bash):

    • fastq_stats — uses sequana.FastQC + matplotlib
    • freebayes_vcf_filter — uses sequana.VCF_freebayes Python class
    • snpeff_add_locus_in_fasta — uses sequana.SnpEff.add_locus_in_fasta()

    The rulegraph rule formerly in wrappers has been migrated to sequana_wrappers/snippets/rulegraph/ because it requires Python imports from sequana_pipetools — it cannot run as a pure bash command inside a container.

Quick start — shells (recommended)

Install the package:

pip install sequana_wrappers

Use in a Snakemake pipeline via sequana_pipetools:

# In your pipeline’s .rules file — manager is a PipelineManager instance
rule minimap2:
    input:   ...
    output:  "{sample}/{sample}.sorted.bam"
    container: "https://zenodo.org/record/7987999/files/samtools_1.17_minimap2_2.24.0.img"
    shell:   manager.get_shell("minimap2/align", "v1")

Quick start — snippets (Python run blocks)

When a pipeline step requires Python logic (host-side imports, file path resolution, etc.) but you still want the code to be shared and versioned, use get_run with a run: block:

rule rulegraph:
    input:   "Snakefile"
    output:  "rulegraph/rulegraph.svg"
    params:  configname="config.yaml"
    run:
        manager.get_run("rulegraph/run", "v1")(snakemake)

The snippet's execute(input, output, params) function runs on the host (where sequana_pipetools and other Python dependencies are available) — no container is involved.

Quick start — wrappers (legacy)

snakemake --wrapper-prefix https://github.com/sequana/sequana-wrappers

or with a local copy:

git clone git@github.com:sequana/sequana-wrappers.git sequana_wrappers
snakemake --wrapper-prefix git+file:///home/user/sequana_wrappers

If the environment variable SEQUANA_WRAPPERS is set to git+file:///home/user/sequana_wrappers, all pipelines will automatically use it as the --wrapper-prefix.

The shells/ directory — rationale and design

Background

Sequana pipelines use two mechanisms to provide bioinformatics tools:

  • Wrappers (wrappers/) — Python scripts (wrapper.py) plus a conda environment.yaml. Snakemake fetches and executes them via the wrapper: rule directive.
  • Containers — Apptainer/Singularity images (hosted on Zenodo/Damona) referenced by the container: rule directive.

The problem: wrapper: + container: are incompatible

When a Snakemake rule combines both wrapper: and container:, and the pipeline is run with --use-singularity / --apptainer-prefix, Snakemake v7 executes the wrapper Python script inside the container using the container's own Python binary.

Snakemake does not require Snakemake to be installed inside the container — it bind-mounts its own site-packages from the host at /mnt/snakemake. However, the container's Python binary must be ABI-compatible with the host Python. Old bioconda/Damona images (Python 3.8) fail when the host runs Python 3.10 because C-extension .so files compiled for 3.10 cannot be loaded by Python 3.8:

ImportError: /mnt/snakemake/...so: cannot open shared object

A further sign of inverted concerns: Damona had to ship Python inside tool containers (bwa, samtools, …) specifically to satisfy the wrapper mechanism. A container for bwa should contain bwa, not a Python runtime serving the pipeline framework.

Options considered

Option 1 — Update container images to Python 3.10

Rebuild all Damona images with a Python version matching the host. Wrappers would then work with containers as originally intended.

Rejected as the primary fix: containers would still carry Python only to serve the framework; images must be rebuilt every time the host Python is upgraded; the inverted concern is not resolved.

Option 2 — Remove container: from wrapper rules (short-term workaround)

Wrapper rules run on the host (or via --use-conda); only pure shell: rules keep their container: directive.

Used as a temporary workaround in sequana_mapper while the shell library was being designed. Downside: Apptainer only covers a subset of rules.

Option 3 — Drop wrappers, inline shell: in each pipeline (simplest)

Replace every wrapper with a hand-written shell: block inside the pipeline rule. No shared library.

Rejected: duplicates logic across all pipelines; maintenance burden; loses the reusability benefit of this repository entirely.

Option 4 — Shell command library in shells/ ✓ (chosen)

Return to the spirit of the early sequana approach: define reusable, versioned shell command strings here, alongside the existing wrappers/. Pipelines import these strings and use them in shell: + container: rules.

Why this wins:

Property Wrappers Shell library
Reusable logic Yes (Python) Yes (string)
Python in container Required Not needed
Git tag checkout at run time Yes No
Damona images lean No Yes
Works with --use-conda Yes No
Apptainer compatible Only if Python ABI matches Always
Backward compatible Yes (wrappers/ kept)

The wrappers/ tree is kept untouched for full backward compatibility.

Design

Repository layout

sequana-wrappers/
├── wrappers/                        # existing — kept for backward compat
│   ├── bwa/align/wrapper.py
│   └── ...
└── sequana_wrappers/
    ├── __init__.py                  # get_shell() and get_run()
    ├── shells/                      # container-first shell strings
    │   ├── bwa/
    │   │   ├── align/
    │   │   │   └── v1/cmd.py        # frozen at release v1
    │   │   └── build/
    │   │       └── v1/cmd.py
    │   ├── bamtools/stats/v1/cmd.py
    │   └── ...
    └── snippets/                    # host-side Python callables
        ├── rulegraph/run/v1/code.py
        └── ...

Versioning convention

Every shell script is named cmd.py and lives inside a named version subdirectory. The structure is:

sequana_wrappers/shells/<tool>/<command>/<version>/cmd.py

Valid version names are:

  • vN (e.g. v1, v2) — frozen, reproducible snapshots. Once created, these files are never edited.
  • dev — work-in-progress version used during active development. A dev/ directory is created when new work begins on a command and removed (or renamed to vN) at release time. No dev/ directories exist in released versions of this package.

Every shell script is named cmd.py; the tool and command are encoded entirely in the directory path. This makes future deeper nesting (e.g. shells/bamtools/stats/paired/v1/cmd.py) natural without any changes to the get_shell API.

There is no silent fallback between versions: requesting a version that does not exist raises an explicit error.

Version axes

Two version axes are completely independent:

Axis What it pins Where it lives
Tool binary e.g. bwa 0.7.17 Container image (Damona / Zenodo)
Shell command e.g. v1 hardcoded per rule in the pipeline

Each pipeline rule hardcodes its own shell command version independently — one rule can use v1 while another uses v2 if only that command changed.

Shell file format

Each cmd.py exports a single CMD string using Snakemake's standard {input}, {output}, {params}, {threads}, {log}, and {wildcards} placeholders:

# sequana_wrappers/shells/bwa/align/v1/cmd.py
CMD = """\
mkdir -p {params.tmp_directory}
(bwa mem -t {threads} {params.options} {input.reference} {input.fastq} \
 | sambamba view -t {threads} -S -f bam -o /dev/stdout /dev/stdin \
 | sambamba sort /dev/stdin -o {output.sorted} -t {threads} \
   --tmpdir={params.tmp_directory}) \
> {log} 2>&1
"""

Usage in a pipeline rule

get_shell is available as a method on the PipelineManager instance (already present in every pipeline rules file) — no extra import needed. The version is hardcoded per rule:

rule bwa:
    input:   ...
    output:  sorted="{sample}/{sample}.sorted.bam"
    log:     "{sample}/bwa/{sample}.log"
    params:  options=config["bwa"]["options"],
             tmp_directory=config["bwa"]["tmp_directory"]
    threads: 2
    container: config['apptainers']['bwa']
    shell:   manager.get_shell("bwa/align", "v1")

Use "dev" during development before a versioned snapshot exists:

    shell:   manager.get_shell("bwa/align", "dev")

The container contains only the tool binaries — no Python, no Snakemake.

Adding or updating a shell command

During development:

  1. Create sequana_wrappers/shells/<tool>/<command>/dev/ with __init__.py and cmd.py.
  2. Use manager.get_shell("<tool>/<command>", "dev") in the pipeline rule.
  3. Test against the relevant Damona container.

At release time:

VERSION=v2
mkdir -p sequana_wrappers/shells/<tool>/<command>/${VERSION}
touch sequana_wrappers/shells/<tool>/<command>/${VERSION}/__init__.py
cp sequana_wrappers/shells/<tool>/<command>/dev/cmd.py \
   sequana_wrappers/shells/<tool>/<command>/${VERSION}/cmd.py
rm -rf sequana_wrappers/shells/<tool>/<command>/dev

Then update the pipeline rule to manager.get_shell("<tool>/<command>", "v2") and bump version in pyproject.toml. No git tag required — the directory is the version.


The snippets/ directory — rationale

Some pipeline steps require Python logic that cannot be expressed as a pure bash command string. Examples: generating a rule graph (needs sequana_pipetools.DOTParser), post-processing VCF files with a custom Python class, or running tools that depend on host-side Python libraries.

These steps cannot use shell: + container: (the container has no Python runtime; and even if it did, the ABI mismatch problem described above applies). They also cannot use wrapper: for the same ABI reason.

The solution is a snippets/ library of versioned Python callables that are invoked inside Snakemake run: blocks, running entirely on the host where all Python dependencies are available:

sequana_wrappers/snippets/<tool>/<command>/<version>/code.py

Each code.py exports an execute(input, output, params) function. Versioning follows the same convention as shells (v1, v2, …, dev). get_run loads the callable by path and version — no git-time fetching, no ABI concerns.

Property Wrappers Shell library Snippet library
Python in container Required Not needed N/A (host-side)
Container needed Optional Yes No
Snakemake directive wrapper: shell: run:
Reusable & versioned Yes Yes Yes
Python imports on host Yes No Yes

The repository layout extended with snippets:

sequana-wrappers/
├── wrappers/                            # legacy — maintenance only
├── sequana_wrappers/
│   ├── shells/                          # container-first shell strings
│   │   ├── bwa/align/v1/cmd.py
│   │   └── ...
│   └── snippets/                        # host-side Python callables
│       ├── rulegraph/run/v1/code.py
│       └── ...

Notes for developers

Overview

wrappers/ is in maintenance mode. Bug fixes are welcome; new wrappers are not accepted. All new tool commands should be added to sequana_wrappers/shells/ instead. See the shells rationale for context.

The wrappers/ directory contains the legacy wrappers. Each sub-directory is dedicated to a wrapper that is related to a given software/application. A sub directory may have several wrappers (e.g., bwa has a sub directory related to the indexing, and a sub directory related to mapping).

Here is an example of a wrapper tree structure:

fastqc
├── environment.yaml
├── README.md
├── test
│   ├── README.md
│   ├── Snakefile
│   ├── test_R1_.fastq
│   └── test_R2_.fastq
└── wrapper.py

Note that some software may have several sub wrappers (see the bowtie1 wrapper for instance).

A wrapper directory must contain a file called wrapper.py where the developers must provide the core of the wrapper. There is no specific instructions here except to write good code as much as possible (with comments).

A wrapper directory should have a test directory for continuous integration with a Snakefile to be tested and possibly data file Do not add large files here. A README.md should be added to explain the origin of the test data files. Finally, include your tests in the main test.py file of the root of the repository (not the wrapper itself).

For testing purposes, you should also add a file called environment.yaml to tell what are the required packages to be installed for the test (and wrapper) to work.

Finally, for the documentation, we ask the developer to create a README.md file described here below.

To test your new wrapper (called example here), type:

pytest test.py -k test_example

The config file

If required in a wrapper, parameters must be defined in a config.yaml file. Similarly for threading. Consider the following pointswhen writting a wrapper:

  • The thread paramter should also be a parameter in config file.
  • the params section should contain a key called options also define in the config file.
  • keys or parameters related to directories and files should use the _directory or _file suffices. This is for Sequanix application to automatically recognised those options with a dedicated widget.

Consider this example:

rule falco:
    input: whatever
    output: whatever
    log:
        "samples/{sample}/falco.log"
    threads:
        config['falco']['threads']
    params:
        options=config['falco']['options'],
        wkdir=config['falco']['working_directory']
    wrapper:
        "falco/wrappers/falco"

You config file will look like:

falco:
    threads: 4
    options="--verbose"
    working_directory: 'here'

Naming arguments of the different sections

In all sections (e.g., input, params), if there is only one input, no need to name it, otherwise, please do.

rule example1:
    input:
        "test.bam"
    output:
        "test.sorted.bam"
    ...

but:

rule example1:
    input:
        "test.bam"
    output:
        bam="test.sorted.bam"
        bai="test.sorted.bam.bai"
    ...

Documentation

Each wrapper should have a dedicated documentation explaining the input/output with a usage example. It should also document the expected configuration file. The file must be formatted in markdown. It must contain a Documentation and Example sub sections. If a Configuration section is found, it is also added to the documentation. This README.md file will be rendered automatically via a Sequana sphinx plugin. Consider the fastqc directory for a workable example rendered here.

Faqs

adding a new wrapper in practice

In ./wrappers, add a new wrapper. Copy the existing fastqc wrapper for instance. Edit the wrapper.py and design a test/Snakefile example for testing. Since you are a developer, you are problaby developping in a dedicated branch. Let us call it dev.

In the test/Snakefile, you should switch from the main to the dev in the wrapper path:

wrapper:
    "dev/wrappers/my_new_wrapper"

In order to test your Snakefile, you first need to commit the wrapper.py. Then, execute the Snakefile:

snakemake -s Snakefile  -j 1 --wrapper-prefix git+file:///YOURPATH/sequana-wrappers/ -f -p

If it fails, edit and commit your wrapper.py and execute again until your Snakefile and wrappers are functional.

Once done, switch back the wrapper path to the main branch:

wrapper:
    "main/wrappers/my_new_wrapper"

Time to include the new wrapper in the continous integration. Go to the root of sequana-wrappers and add a functional test to the end of test.py. Then, test it:

pytest test.py -k my_new_wrapper -v

You are ready to push and create a pull-requests

Tagging

You may consider to add a tag. Our convention is to use a tag with the YEAR.MONTH.DAY where day and month do not include extra zeros. So you would have e.g.:

v23.11.11
v23.2.2

but not v23.02.02

We use annotated tags so the command is e.g.:

git tag -a v23.11.11

and then push it to github:

git push origin main v23.11.11

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sequana_wrappers-26.4.16.tar.gz (24.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sequana_wrappers-26.4.16-py3-none-any.whl (56.5 kB view details)

Uploaded Python 3

File details

Details for the file sequana_wrappers-26.4.16.tar.gz.

File metadata

  • Download URL: sequana_wrappers-26.4.16.tar.gz
  • Upload date:
  • Size: 24.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sequana_wrappers-26.4.16.tar.gz
Algorithm Hash digest
SHA256 4a96f18ca68e99e6c8a0814bf42ef1555eac4e7c81eb19281bb37db50790f289
MD5 a9c439fd8a31394014251a7fd8378a2a
BLAKE2b-256 fe3544f9236a659fdf0de9adbbd04ef7c23c07006741ecb0d221e38c15a8903d

See more details on using hashes here.

Provenance

The following attestation bundles were made for sequana_wrappers-26.4.16.tar.gz:

Publisher: pypi.yml on sequana/sequana-wrappers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sequana_wrappers-26.4.16-py3-none-any.whl.

File metadata

File hashes

Hashes for sequana_wrappers-26.4.16-py3-none-any.whl
Algorithm Hash digest
SHA256 aa41e4851b17dbc4edf263307a0b494dfd4351a309be582e3f6fcedd17f15b6d
MD5 bb5f6a26bce1415112d93132385facae
BLAKE2b-256 58f2ac8c15f9f070e4c2583838b6e6ae09b0d38a525a8066db3ff0df1f4ebe09

See more details on using hashes here.

Provenance

The following attestation bundles were made for sequana_wrappers-26.4.16-py3-none-any.whl:

Publisher: pypi.yml on sequana/sequana-wrappers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page