Workflow to screen shotgun metagenomic samples for the presence of biological impurities.
Project description
MetaCARP
MetaCARP (Metagenomic Contamination-Assessment-of-Retail-Products) is a workflow to screen shotgun metagenomic sequencing data, for the presence of biological impurities, including allergens. The workflow was developed for the analysis of sequencing data derived from commercial vitamin-containing food products.
Fun fact: the common carp scavenges food by rooting in sediment and mouthing the contents to identify food items (source).
DISCLAIMER: This pipeline comes without any guarantees, and no legal claims can be made based on the results obtained with this pipeline. Potential contaminations reported by this pipeline can be false positive, and their presence must always be verified through additional (validated) assays.
MetaCARP is also available on our public Galaxy instance (registration required).
INSTALLATION
CONDA installation
conda install -c bioconda -c conda-forge metacarp
If the above command fails, MetaCARP can be installed in a new environment using the following commands:
conda create -n metacarp python=3.10
conda activate metacarp
conda install bioconda::metacarp -c bioconda -c conda-forge
Note that the conda installation does not include a Kraken2 Database. More information on how to build a Kraken2 Database is available in the Kraken2 Manual.
Manual installation
The MetaCARP workflow has the following dependencies:
The GMM screening pipeline has the following additional dependencies:
The corresponding binaries should be in your PATH to run the workflow. Other versions of these tools may work, but have not been tested.
Python 3.10 is recommended.
virtualenv metacarp_env --python=python3.10;
. metacarp_env/bin/activate;
pip install metacarp;
USAGE
usage: METACARP [--ilmn-in ILMN_IN] [--ont-in ONT_IN] [--dir-working DIR_WORKING] --output OUTPUT [--output-html OUTPUT_HTML]
--kraken-db KRAKEN_DB [--usual-suspects USUAL_SUSPECTS] [--allergens ALLERGENS]
[--cutoff-allergens CUTOFF_ALLERGENS] [--cutoff-unclassified CUTOFF_UNCLASSIFIED]
[--cutoff-prok-fungi CUTOFF_PROK_FUNGI] [--cutoff-euk CUTOFF_EUK]
[--cutoff-confidence CUTOFF_CONFIDENCE] [--threads THREADS] [--version]
options:
--ilmn-in ILMN_IN Directory with Illumina input FASTQ files
--ont-in ONT_IN Directory with ONT input FASTQ files
--dir-working DIR_WORKING
Working directory
--output OUTPUT Output directory
--output-html OUTPUT_HTML
Output report name
--kraken-db KRAKEN_DB Directory with Kraken DB
--usual-suspects USUAL_SUSPECTS
TSV file with species, taxid and group (Bacteria, Fungi, Plants, or Animals) of usual suspects
(default: resources/usual_suspects.tsv)
--allergens ALLERGENS
TSV file with taxids of allergens (default: resources/allergens.tsv)
--cutoff-allergens CUTOFF_ALLERGENS', type=int, default=1, help='Minimal relative abundance (in %) of allergens detected with Kraken2 to report')
--cutoff-unclassified CUTOFF_UNCLASSIFIED
Warning threshold for relative abundance (in %) of unclassified reads detected with Kraken2 (default: 5)
--cutoff-prok-fungi CUTOFF_PROK_FUNGI
Minimal relative abundance (in %) of prokaryotic or fungal species detected with Kraken2 to select for read mapping (default: 0.1)
--cutoff-euk CUTOFF_EUK
Minimal relative abundance (in %) of eukaryotic (non-fungal) species detected with Kraken2 to select for read mapping (default: 1)
--cutoff-confidence CUTOFF_CONFIDENCE
JSON configuration file with cutoffs for high and low confidence detection (default: resources/confidence_cutoffs.json)
--threads THREADS
--version Print version and exit
Basic usage example
The MetaCARP workflow processes Illumina and/or ONT FASTQ files as input.
Illumina data can be provided using the --ilmn-in option, ONT data can be provided using the --ont-in option.
All input files should be gzipped, and Illumina file names should be formatted as {samplename}_S*_R1_*.fastq.gz and {samplename}_S*_R2_*.fastq.gz, or {samplename}_1.fastq.gz and {samplename}_2.fastq.gz.
METACARP \
--ilmn-in in/ilmn/ \
--ont-in in/ont/ \
--output output/ \
--dir-working work/ \
--threads 8
GMM screening workflow
An additional script is available to screen shotgun metagenomic samples for the presence of known genetically modified microorganisms (GMM) based on a set of 'junction' sequences and marker genes. This workflow employs kma for read-based gene detection and relies on the 'Camel' code base developed by the Bioinformatics Platform of Sciensano.
usage: GMM_screening [--ilmn-in ILMN_IN] [--ont-in ONT_IN] [--dir-working DIR_WORKING] --output OUTPUT
[--output-tsv OUTPUT_TSV] [--db DB] [--min-identity MIN_IDENTITY] [--min-coverage MIN_COVERAGE] [--threads THREADS]
options:
--ilmn-in ILMN_IN Directory with Illumina input FASTQ files
--ont-in ONT_IN Directory with ONT input FASTQ files
--dir-working DIR_WORKING
Working directory
--output OUTPUT Output directory
--output-tsv OUTPUT_TSV
Output report name
--db DB Directory with GMM detection database - either the database 'junctions' (resources/DB_GMM/V6/junctions) or 'genes-vectors' with complete genes and vectors (resources/DB_GMM/V6/genes-vectors) can be chosen
--min-identity MIN_IDENTITY Minimal % identity with template to report kma hit (default: 90)
--min-coverage MIN_COVERAGE Minimum % coverage of template to report kma hit (default: 90)
--threads THREADS
CONTACT
Create an issue to report bugs, propose new functions or ask for help.
CITATION
If you use this tool, please consider citing our publication.
Copyright - 2026 Jolien D'aes jolien.daes@sciensano.be
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file metacarp-1.0.0.tar.gz.
File metadata
- Download URL: metacarp-1.0.0.tar.gz
- Upload date:
- Size: 1.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
74d9cdb67c58d32c26c9046d83638a36efca266348be57fe133be666392f229c
|
|
| MD5 |
d8f376b0bed460f2355df4b662a24d17
|
|
| BLAKE2b-256 |
8393cc5cc38d37c921a8234bb7762facd6ebda51833e222241a0872e3b85dd4b
|
File details
Details for the file metacarp-1.0.0-py3-none-any.whl.
File metadata
- Download URL: metacarp-1.0.0-py3-none-any.whl
- Upload date:
- Size: 1.5 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c8864f414912e55a3b0f6bb818ef7b1c036deea524946e7ab9423c9e22746145
|
|
| MD5 |
c8003b0cf63998daee78fe093e057910
|
|
| BLAKE2b-256 |
8631282c219bb2e9802c94f16ad00f035819eb5d983e5eda831e60d4148574b8
|