Skip to main content

CroCoDeEL is a tool that detects cross-sample (aka well-to-well) contamination in shotgun metagenomic data

Project description

CroCoDeEL : CROss-sample COntamination DEtection and Estimation of its Levels 🐊

install with conda PyPI

Introduction

CroCoDeEL is a tool that detects cross-sample (aka well-to-well) contamination in shotgun metagenomic data.
It accurately identifies contaminated samples but also pinpoints contamination sources and estimates contamination rates.
CroCoDeEL relies only on species abundance tables and does not need negative controls.

Installation

CroCoDeEL is available on bioconda:

conda create --name crocodeel_env -c conda-forge -c bioconda crocodeel
conda activate crocodeel_env

Alternatively, you can use pip:

pip install crocodeel

Finally, you can test that CroCoDeEL is correctly installed with the following command:

crocodeel test_install

Quick start

Input

CroCoDeEL takes as input a species abundance table in TSV format.
The first column should correspond to species names. The other columns correspond to the abundance of species in each sample.

species_name sample1 sample2 sample3 ...
species 1 0 0.05 0.07 ...
species 2 0.1 0.01 0 ...
... ... ... ... ...

CroCoDeEL works with relative abundances. The table will automatically be normalized so the abundance of each column equals 1.

Important: CroCoDeEL requires the abundance of subdominant species to be accurately estimated.
We strongly recommend using the Meteor software suite to generate the species abundance table.
Alternatively, you can use sylph although low-level contaminations may go unnoticed.
We advise against the use of other taxonomic profilers (e.g. MetaPhlan4 or mOTUs) that do not meet this requirement according to our benchmarks.

Search contamination

Run the following command to search for cross-sample contamination:

crocodeel search_conta -s species_abundance.tsv -c contamination_events.tsv

CroCoDeEL will report all detected contamination events in the contamination_events.tsv output file.
This TSV file reports for each event the contamination source, the contaminated sample (target) and the estimated contamination rate.
The score (probability) computed by the Random Forest model as well as species specifically introduced by contamination in the target are also given.

Visualization of the results

Contaminations events can be visually inspected by generating a PDF file consisting in scatterplots.

crocodeel plot_conta -s species_abundance.tsv -c contamination_events.tsv -r contamination_events.pdf

Each scatterplot compares in a log-scale the species abundance profiles of a contaminated sample (x-axis) and its contamination source (y-axis).
The contamination line (in red) highlights species specifically introduced by contamination.

Easy workflow

Alternatively, you can search for cross-sample contamination and create the PDF report in one command.

crocodeel easy_wf -s species_abundance.tsv -c contamination_events.tsv -r contamination_events.pdf

Results interpretation

CroCoDeEL will probably report false contamination events for samples with similar species abundances profiles (e.g. longitudinal data, animals raised together).
For non-related samples, CroCoDeEL may occasionally generate false positives that can be filtered out by a human-expert.
Thus, we strongly recommend inspecting scatterplots of each contamination event to discard potential false positives.
We will explain how to do it soon.

Authors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crocodeel-1.0.5.tar.gz (737.9 kB view details)

Uploaded Source

Built Distribution

crocodeel-1.0.5-py3-none-any.whl (735.8 kB view details)

Uploaded Python 3

File details

Details for the file crocodeel-1.0.5.tar.gz.

File metadata

  • Download URL: crocodeel-1.0.5.tar.gz
  • Upload date:
  • Size: 737.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.3 Linux/6.5.0-1021-azure

File hashes

Hashes for crocodeel-1.0.5.tar.gz
Algorithm Hash digest
SHA256 98bf8f377ab22da163abad426dfa553647815d641c461aa42c8ec73dbc2a90dd
MD5 a0ba3f95a7a0a3a1e709e5fd48285467
BLAKE2b-256 cb0c08fafcf4199c3f8a74666c845198ed81f91d8d1a9054e64f67de54faa083

See more details on using hashes here.

File details

Details for the file crocodeel-1.0.5-py3-none-any.whl.

File metadata

  • Download URL: crocodeel-1.0.5-py3-none-any.whl
  • Upload date:
  • Size: 735.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.3 Linux/6.5.0-1021-azure

File hashes

Hashes for crocodeel-1.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 289b4a1af43a4a58b936b4a0cf3a447e997bb2590e2e7aa266d99e00462d7e08
MD5 a79524a40b07b0e7b0ac2844ea36d6de
BLAKE2b-256 c9e4e7a05aa69d9d1324d7ec56160e911121e134654a88f68510428e566297a4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page