Skip to main content

decOM: K-mer method for aOral metagenome decontamination

Project description

decOM: Microbial source tracking for contamination assessment of ancient oral samples using k-mer-based methods

decOM is a high-accuracy microbial source tracking method that is suitable for contamination quantification in paleogenomics, namely the analysis of collections of possibly contaminated ancient oral metagenomic data sets. In simple words, if you want to know how contaminated your ancient oral metagenomic sample is, this tool will help :)

pipeline_version2

System requirements

decOM has been developed and tested under a Linux environment. It requires certain packages/tools in order to be installed/used:

Installation

Install decOM through conda:

conda install -c camiladuitama decom

To make the decOM command available, it is advised to include the absolute path of decOM in your PATH environment variable by adding the following line to your ~/.bashrc file:

export PATH=/absolute/path/to/decOM:${PATH}

Before running decOM

The users of decOM can represent their own metagenomic sample as a presence/absence vector of k-mers using kmtricks, and compare this new sink against the collection of sources we have put together. This means that before running decOM you must first download the folder decOM_sources.tar.gz and decompress it

wget https://zenodo.org/record/6513520/files/decOM_sources.tar.gz
tar -xf decOM_sources.tar.gz

Test

You can test if decOM is working by using one of the aOral samples present in the test/sample/ folder, ex: SRR13355787.

decOM -s SRR13355787 -p_sources decOM_sources/ -k SRR13355787_key.fof -mem 10GB -t 5 -o decOM_output/

Note: The final memory allocated for each run of decOM will be your input in -mem times the number of cores. In the previous run we used 10GB * 5 = 50 GB.

Output files

decOM will output one .csv file with the k-mer counts and proportions, a folder with the vector representing the sink and a barplot if indicated by the user

decOM_output/
├──{sink}_OM_output.csv  
├──result_plot_{sink}.pdf
├──{sink}_vector/

Example from an input fastq/fasta file

You can use as input your fastq/fasta file from your own experiment, you can download an ancient oral sample of interest from the AncientMetagenomeDir or from the SRA.

Once you have downloaded the folder with the matrix of sources decOM_sources.tar.gz , and your fastq file(s) of interest (from now on called sink), you have to create a key.fof file per sink. The key.fof has one line of text depending on your type of data:

-Paired-end : s : path/to/file/s_1.fastq.gz

-Single-end: s : path/to/file/s_1.fastq.gz; path/to/file/s_2.fastq.gz

Note: As decOM relies on kmtricks, you might use a FASTA or FASTQ format, gzipped or not. Which means you have to change the key.fof file accordingly.

Since you now have the fasta/fastq file of your sink, the folder with the matrix of sources and the key file, simply run decOM as follows:

decOM -s {SINK} -p_sources decOM_sources/ -k {KEY.FOF} -mem {MEMORY} -t {THREADS} -o {OUTPUT}

Command line options

usage: modules [-h] -s SINK -p_sources PATH_SOURCES -k KEY -mem MEMORY -t THREADS -o OUTPUT [-p PLOT] [-V]

Microbial source tracking for contamination assessment of ancient oral samples using k-mer-based methods

optional arguments:
  -h, --help            show this help message and exit
  -s SINK, --sink SINK  Write down the name of your sink
  -p_sources PATH_SOURCES, --path_sources PATH_SOURCES
                        path to folder downloaded from https://zenodo.org/record/6385193#.Ym-wTy8RphA
  -k KEY, --key KEY     filtering key (a kmtricks fof with only one sample).
  -mem MEMORY, --memory MEMORY
                        Write down how much memory you want to use for this process. Ex: 20GiB
  -t THREADS, --threads THREADS
                        Number of threads to use. Ex: 5
  -o OUTPUT, --output OUTPUT
                        Path to output folder, where you want decOM to write the results
  -p PLOT, --plot PLOT  True if you want a plot with the source proportions of the sink, else False
  -V, --version         Show version number and exit

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

decOM-0.0.16.tar.gz (33.3 kB view details)

Uploaded Source

Built Distribution

decOM-0.0.16-py3-none-any.whl (31.4 kB view details)

Uploaded Python 3

File details

Details for the file decOM-0.0.16.tar.gz.

File metadata

  • Download URL: decOM-0.0.16.tar.gz
  • Upload date:
  • Size: 33.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.10

File hashes

Hashes for decOM-0.0.16.tar.gz
Algorithm Hash digest
SHA256 c90df4f35038f699a528f0d6a58d144e42ff1df87b8cb2f6a6760ca9def60f76
MD5 5e3d0807409b587c84210b72aca69f1f
BLAKE2b-256 a0c3dfbf2134b94c6f732543c84fd923ddb7f2f631f483a208ae254e48faa109

See more details on using hashes here.

File details

Details for the file decOM-0.0.16-py3-none-any.whl.

File metadata

  • Download URL: decOM-0.0.16-py3-none-any.whl
  • Upload date:
  • Size: 31.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.10

File hashes

Hashes for decOM-0.0.16-py3-none-any.whl
Algorithm Hash digest
SHA256 12ce4892041e15ec2a033790d6effb80d9c7e8e258a1e6b77a9365539bb253b1
MD5 b00478b26efeb4397354306798bc7428
BLAKE2b-256 0b5c0a99b3dc6d682518d0e8d973d7788214271ca43c38b02d8883865ee8b711

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page