Skip to main content

Workflow to screen shotgun metagenomic samples for the presence of biological impurities.

Project description

MetaCARP

MetaCARP (Metagenomic Contamination-Assessment-of-Retail-Products) is a workflow to screen shotgun metagenomic sequencing data, for the presence of biological impurities, including allergens. The workflow was developed for the analysis of sequencing data derived from commercial vitamin-containing food products.

Fun fact: the common carp scavenges food by rooting in sediment and mouthing the contents to identify food items (source).

DISCLAIMER: This pipeline comes without any guarantees, and no legal claims can be made based on the results obtained with this pipeline. Potential contaminations reported by this pipeline can be false positive, and their presence must always be verified through additional (validated) assays.

MetaCARP is also available on our public Galaxy instance (registration required).


INSTALLATION

CONDA installation

conda install -c bioconda -c conda-forge metacarp

If the above command fails, MetaCARP can be installed in a new environment using the following commands:

conda create -n metacarp python=3.10
conda activate metacarp
conda install bioconda::metacarp -c bioconda -c conda-forge

Note that the conda installation does not include a Kraken2 Database. More information on how to build a Kraken2 Database is available in the Kraken2 Manual.

Manual installation

The MetaCARP workflow has the following dependencies:

The GMM screening pipeline has the following additional dependencies:

The corresponding binaries should be in your PATH to run the workflow. Other versions of these tools may work, but have not been tested.

Python 3.10 is recommended.

virtualenv metacarp_env --python=python3.10;
. metacarp_env/bin/activate;
pip install metacarp;

USAGE

usage: METACARP [--ilmn-in ILMN_IN] [--ont-in ONT_IN] [--dir-working DIR_WORKING] --output OUTPUT [--output-html OUTPUT_HTML] 
            --kraken-db KRAKEN_DB [--usual-suspects USUAL_SUSPECTS] [--allergens ALLERGENS] 
            [--cutoff-allergens CUTOFF_ALLERGENS] [--cutoff-unclassified CUTOFF_UNCLASSIFIED] 
            [--cutoff-prok-fungi CUTOFF_PROK_FUNGI] [--cutoff-euk CUTOFF_EUK] 
            [--cutoff-confidence CUTOFF_CONFIDENCE] [--threads THREADS] [--version]

options:
  --ilmn-in ILMN_IN     Directory with Illumina input FASTQ files
  --ont-in ONT_IN       Directory with ONT input FASTQ files
  --dir-working DIR_WORKING
                        Working directory
  --output OUTPUT       Output directory
  --output-html OUTPUT_HTML 
                        Output report name
  --kraken-db KRAKEN_DB Directory with Kraken DB
  --usual-suspects USUAL_SUSPECTS   
                        TSV file with species, taxid and group (Bacteria, Fungi, Plants, or Animals) of usual suspects 
                        (default: resources/usual_suspects.tsv)
  --allergens ALLERGENS 
                        TSV file with taxids of allergens (default: resources/allergens.tsv)
  --cutoff-allergens CUTOFF_ALLERGENS', type=int, default=1, help='Minimal relative abundance (in %) of allergens detected with Kraken2 to report')
  --cutoff-unclassified CUTOFF_UNCLASSIFIED
                        Warning threshold for relative abundance (in %) of unclassified reads detected with Kraken2 (default: 5)
  --cutoff-prok-fungi CUTOFF_PROK_FUNGI
                        Minimal relative abundance (in %) of prokaryotic or fungal species detected with Kraken2 to select for read mapping (default: 0.1)
  --cutoff-euk CUTOFF_EUK
                        Minimal relative abundance (in %) of eukaryotic (non-fungal) species detected with Kraken2 to select for read mapping (default: 1)
  --cutoff-confidence CUTOFF_CONFIDENCE
                        JSON configuration file with cutoffs for high and low confidence detection (default: resources/confidence_cutoffs.json)                  
  --threads THREADS
  --version             Print version and exit

Basic usage example

The MetaCARP workflow processes Illumina and/or ONT FASTQ files as input. Illumina data can be provided using the --ilmn-in option, ONT data can be provided using the --ont-in option. All input files should be gzipped, and Illumina file names should be formatted as {samplename}_S*_R1_*.fastq.gz and {samplename}_S*_R2_*.fastq.gz, or {samplename}_1.fastq.gz and {samplename}_2.fastq.gz.

METACARP \
    --ilmn-in in/ilmn/ \
    --ont-in in/ont/ \
    --output output/ \
    --dir-working work/ \
    --threads 8

GMM screening workflow

An additional script is available to screen shotgun metagenomic samples for the presence of known genetically modified microorganisms (GMM) based on a set of 'junction' sequences and marker genes. This workflow employs kma for read-based gene detection and relies on the 'Camel' code base developed by the Bioinformatics Platform of Sciensano.

usage: GMM_screening [--ilmn-in ILMN_IN] [--ont-in ONT_IN] [--dir-working DIR_WORKING] --output OUTPUT
            [--output-tsv OUTPUT_TSV] [--db DB] [--min-identity MIN_IDENTITY] [--min-coverage MIN_COVERAGE] [--threads THREADS]

options:
  --ilmn-in ILMN_IN     Directory with Illumina input FASTQ files
  --ont-in ONT_IN       Directory with ONT input FASTQ files
  --dir-working DIR_WORKING
                        Working directory
  --output OUTPUT       Output directory
  --output-tsv OUTPUT_TSV 
                        Output report name
  --db DB Directory with GMM detection database - either the database 'junctions' (resources/DB_GMM/V6/junctions) or 'genes-vectors' with complete genes and vectors (resources/DB_GMM/V6/genes-vectors) can be chosen   
  --min-identity MIN_IDENTITY   Minimal % identity with template to report kma hit (default: 90)
  --min-coverage MIN_COVERAGE   Minimum % coverage of template to report kma hit (default: 90)            
  --threads THREADS

CONTACT

Create an issue to report bugs, propose new functions or ask for help.

CITATION

If you use this tool, please consider citing our publication.


Copyright - 2026 Jolien D'aes jolien.daes@sciensano.be

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metacarp-1.0.0.tar.gz (1.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

metacarp-1.0.0-py3-none-any.whl (1.5 MB view details)

Uploaded Python 3

File details

Details for the file metacarp-1.0.0.tar.gz.

File metadata

  • Download URL: metacarp-1.0.0.tar.gz
  • Upload date:
  • Size: 1.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for metacarp-1.0.0.tar.gz
Algorithm Hash digest
SHA256 74d9cdb67c58d32c26c9046d83638a36efca266348be57fe133be666392f229c
MD5 d8f376b0bed460f2355df4b662a24d17
BLAKE2b-256 8393cc5cc38d37c921a8234bb7762facd6ebda51833e222241a0872e3b85dd4b

See more details on using hashes here.

File details

Details for the file metacarp-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: metacarp-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for metacarp-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c8864f414912e55a3b0f6bb818ef7b1c036deea524946e7ab9423c9e22746145
MD5 c8003b0cf63998daee78fe093e057910
BLAKE2b-256 8631282c219bb2e9802c94f16ad00f035819eb5d983e5eda831e60d4148574b8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page