Skip to main content

decOM: K-mer method for aOral metagenome decontamination

Project description

decOM: Microbial source tracking for contamination assessment of ancient oral samples using k-mer-based methods

decOM is a high-accuracy microbial source tracking method that is suitable for contamination quantification in paleogenomics, namely the analysis of collections of possibly contaminated ancient oral metagenomic data sets. In simple words, if you want to know how contaminated your ancient oral metagenomic sample is, this tool will help :)

pipeline_version2

System requirements

decOM has been developed and tested under a Linux environment. It requires certain packages/tools in order to be installed/used:

Installation

Install decOM through conda:

git clone https://github.com/CamilaDuitama/decOM.git
cd decOM
conda env create -n decOM --file environment.yml

To make the decOM command available, it is advised to include the absolute path of decOM in your PATH environment variable by adding the following line to your ~/.bashrc file:

export PATH=/absolute/path/to/decOM:${PATH}

Before running decOM

BEFORE running decOM you must first download the folder decOM_sources.tar.gz and decompress it

wget https://zenodo.org/record/6513520/files/decOM_sources.tar.gz
tar -xf decOM_sources.tar.gz

Test

You can test if decOM is working by using one of the aOral samples present in the test/sample/ folder, ex: SRR13355787.

decOM -s SRR13355787 -p_sources decOM_sources/ -k SRR13355787_key.fof -mem 10GB -t 5 -o decOM_output/

Note: The final memory allocated for each run of decOM will be your input in -mem times the number of cores. In the previous run we used 10GB * 5 = 50 GB.

Output files

decOM will output one .csv file with the k-mer counts and proportions, a folder with the vector representing the sink(s) and a barplot if indicated by the user

decOM_output/
├──{s}_OM_output.csv  
├──result_plot_{s}.pdf
├──{s}_vector/

Example

You can use as input your fastq/fasta file from your own experiment, you can download an ancient oral sample of interest from the AncientMetagenomeDir or from the SRA. The users of decOM can represent their own metagenomic sample as a presence/absence vector of k-mers using kmtricks. This sample of interest (from now on called sink) can be compared against the collection of sources we have put together.

Once you have downloaded the folder with the matrix of sources and the fastq file(s) of your sink(s), you have to create a key.fof file per sink. The key.fof has one line of text depending on your type of data:

-Paired-end : s : path/to/file/s_1.fastq.gz

-Single-end: s : path/to/file/s_1.fastq.gz; path/to/file/s_2.fastq.gz

Note: As decOM relies on kmtricks, you might use a FASTA or FASTQ format, gzipped or not. Which means you have to change the key.fof file accordingly.

Since you now have the fasta/fastq file of your sink, the folder with the matrix of sources and the key file, simply run decOM as follows:

decOM -s {SINK} -p_sources decOM_sources/ -k {KEY.FOF} -mem {MEMORY} -t {THREADS} -o {OUTPUT}

Command line options

usage: modules [-h] -s SINK -p_sources PATH_SOURCES -k KEY -mem MEMORY -t THREADS -o OUTPUT [-p PLOT] [-V]

Microbial source tracking for contamination assessment of ancient oral samples using k-mer-based methods

optional arguments:
  -h, --help            show this help message and exit
  -s SINK, --sink SINK  Write down the name of your sink
  -p_sources PATH_SOURCES, --path_sources PATH_SOURCES
                        path to folder downloaded from https://zenodo.org/record/6513520/files/decOM_sources.tar.gz
  -k KEY, --key KEY     filtering key (a kmtricks fof with only one sample).
  -mem MEMORY, --memory MEMORY
                        Write down how much memory you want to use for this process. Ex: 20GB
  -t THREADS, --threads THREADS
                        Number of threads to use. Ex: 5
  -o OUTPUT, --output OUTPUT
                        Path to output folder, where you want decOM to write the results
  -p PLOT, --plot PLOT  True if you want a plot with the source proportions of the sink, else False
  -V, --version         Show version number and exit

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

decOM-0.0.17.tar.gz (33.8 kB view details)

Uploaded Source

Built Distribution

decOM-0.0.17-py3-none-any.whl (31.8 kB view details)

Uploaded Python 3

File details

Details for the file decOM-0.0.17.tar.gz.

File metadata

  • Download URL: decOM-0.0.17.tar.gz
  • Upload date:
  • Size: 33.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for decOM-0.0.17.tar.gz
Algorithm Hash digest
SHA256 332e16d66dd7b65d390f245802da19f525d25818d8d20b6a95613f9b56f5947d
MD5 87dd91e580df589b5f426b273a9a5ea1
BLAKE2b-256 ac170b6693cb891b6e02e06faaa5dd326be7201cc3b8bb8ec577302a03653b8a

See more details on using hashes here.

File details

Details for the file decOM-0.0.17-py3-none-any.whl.

File metadata

  • Download URL: decOM-0.0.17-py3-none-any.whl
  • Upload date:
  • Size: 31.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for decOM-0.0.17-py3-none-any.whl
Algorithm Hash digest
SHA256 27638629160b1b44c810949e2ff1c3c288d9bcb9133cea9ef61b0a7e9e64a2ec
MD5 fdacae2d01f3e4e9c7ff84e2a918db59
BLAKE2b-256 95db60ad74fb2067052dc39f3f5a7fe43467af2705d4188199ccc7e795e8419d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page