Skip to main content

description: convert NGS format from one to another using bioconvert

Project description

https://badge.fury.io/py/sequana-bioconvert.svg https://github.com/sequana/bioconvert/actions/workflows/main.yml/badge.svg Python 3.10 | 3.11 | 3.12 JOSS (journal of open source software) DOI

bioconvert — format conversion pipeline

Overview:

Parallelise bioconvert conversions across a set of files

Input:

Any file format supported by bioconvert (FastQ, BAM, FASTA, VCF, …)

Output:

Converted files in the target format, MD5 checksums, and an HTML summary report

Status:

Production

Citation:

Cokelaer et al, (2017), ‘Sequana’: a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, doi:10.21105/joss.00352

Pipeline DAG

Installation

pip install sequana-bioconvert

To upgrade an existing installation:

pip install sequana-bioconvert --upgrade

Install all dependencies via conda/mamba:

mamba env create -f environment.yml

Quick Start

Step 1 — prepare the working directory

Convert all fastq.gz files in a directory to fasta.gz:

sequana_bioconvert \
    --input-directory /path/to/data \
    --input-ext fastq.gz \
    --output-ext fasta.gz \
    --command fastq2fasta

This creates a bioconvert/ working directory with config.yaml and a bioconvert.sh launch script.

Step 2 — run the pipeline:

cd bioconvert
sh bioconvert.sh

Results are written to the output/ subdirectory. An HTML summary report is generated on completion.

Usage

sequana_bioconvert --help

Key options:

  • --input-directory — directory containing the input files (required)

  • --input-ext — extension of input files, e.g. fastq.gz (required)

  • --output-ext — extension of output files, e.g. fasta.gz (required)

  • --command — bioconvert conversion command, e.g. fastq2fasta (required);

    run bioconvert --help for the full list

  • --input-pattern — prefix glob to restrict which files are picked up (default: *);

    e.g. sample_* to process only files starting with sample_

  • --method — override the default conversion method;

    run bioconvert COMMAND --show-methods to list valid methods

Usage with apptainer

All external tools are available through a pre-built apptainer image. To use it, add --use-apptainer when initialising the pipeline:

sequana_bioconvert \
    --input-directory /path/to/data \
    --input-ext fastq.gz \
    --output-ext fasta.gz \
    --command fastq2fasta \
    --use-apptainer \
    --apptainer-prefix ~/.sequana/apptainers

Then run as usual:

cd bioconvert
sh bioconvert.sh

Requirements

  • bioconvert ≥ 1.1.0 — the underlying conversion tool

  • graphviz — for pipeline DAG rendering (available via apptainer)

Install dependencies via conda/mamba:

mamba env create -f environment.yml

Rules and configuration details

The latest configuration file is available at: config.yaml

Each rule used in the pipeline has a corresponding section in config.yaml.

Changelog

Version

Description

1.2.0

  • Update apptainer image to bioconvert 1.1.0

  • Switch to manager.get_shell() — no longer uses sequana_wrappers

  • Remove sequana_wrappers field from config and schema

  • Use importlib.metadata for version (fixes >=x.y.z display in HTML reports)

  • --input-pattern now optional (default *); combined with --input-ext to form the actual glob pattern

  • Add md5_output.txt alongside md5_input.txt

  • Improved HTML report: method display, bioconvert doc link, cleaner table labels

  • Early exit with clear error if no input files are found

  • Fix fragile sample name extraction for multi-dot filenames

1.1.0

  • Update apptainer image to bioconvert 1.1.0

  • CI: update to Python 3.10/3.11/3.12 and actions/checkout@v4

1.0.0

Uses bioconvert 1.0.0

0.10.0

Add container

0.9.0

Version using new sequana/sequana_pipetools framework

0.8.1

Working version

0.8.0

First release

Contribute & Code of Conduct

To contribute to this project, please take a look at the Contributing Guidelines first. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sequana_bioconvert-1.2.0.tar.gz (117.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sequana_bioconvert-1.2.0-py3-none-any.whl (117.0 kB view details)

Uploaded Python 3

File details

Details for the file sequana_bioconvert-1.2.0.tar.gz.

File metadata

  • Download URL: sequana_bioconvert-1.2.0.tar.gz
  • Upload date:
  • Size: 117.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.0.1 CPython/3.10.14 Linux/6.14.5-100.fc40.x86_64

File hashes

Hashes for sequana_bioconvert-1.2.0.tar.gz
Algorithm Hash digest
SHA256 0faf5788cdbfce892b633052b7de8d9265f8aacedf86a854a230877352068bce
MD5 04dfb7d8050da9ecb0608cc6fdcd54f1
BLAKE2b-256 b85c5247efd4143800112cfd94b175dc22ceaaa4533c9195f2529fc42e02ea02

See more details on using hashes here.

File details

Details for the file sequana_bioconvert-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: sequana_bioconvert-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 117.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.0.1 CPython/3.10.14 Linux/6.14.5-100.fc40.x86_64

File hashes

Hashes for sequana_bioconvert-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e428bd04d59419f75707ef7534a7540fcc2119ba48a0aff2525717232e4df794
MD5 95400074e1f5eaa1337112fc4f427a04
BLAKE2b-256 2a8ae5900ffd13f46cbb494bc2862081d7a73e59966182e9827459514793c7e9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page