Skip to main content

decOM: Similarity-based microbial source tracking for contamination assessment of ancient oral samples using k-mer-based methods

Project description

decOM: Similarity-based microbial source tracking for contamination assessment of ancient oral samples using k-mer-based methods

decOM is a high-accuracy microbial source tracking method that is suitable for contamination quantification in paleogenomics, namely the analysis of collections of possibly contaminated ancient oral metagenomic data sets. In simple words, if you want to know how contaminated your ancient oral metagenomic sample is, this tool will help 🧹🦷

pipeline_version2

System requirements

decOM has been developed and tested under a Linux environment and it only works in Linux-like systems. It requires certain packages/tools in order to be installed/used:

Installation

Install decOM through conda:

git clone https://github.com/CamilaDuitama/decOM.git  
cd decOM  
conda env create -n decOM --file environment.yml  
conda deactivate
conda activate decOM  

To make the decOM command available, it is advised to include the absolute path of decOM in your PATH environment variable by adding the following line to your ~/.bashrc file:

export PATH=/absolute/path/to/decOM:${PATH}  

Before running decOM

BEFORE running decOM you must first download the folder decOM_sources.tar.gz and decompress it. You can either follow the link or use wget (it has to be installed in your computer first):

wget https://zenodo.org/record/6513520/files/decOM_sources.tar.gz  
tar -xf decOM_sources.tar.gz

If you did not use wget to download the matrix of sources and instead followed the link, make sure you know where the path to your file is and type it accordingly in the upcoming commands whenever p_sources is needed.

Test

One sink

You can test if decOM is working by using the aOral sample present in the test/sample/ folder, ex: SRR13355807.

decOM -s SRR13355807 -p_sources decOM_sources/ -k tests/sample/SRR13355807.fof -mem 10GB -t 5 

Note: The final memory allocated for each run of decOM will be your input in -mem times the number of cores (-t). In the previous run we used 10GB * 5 = 50 GB. It is recommended to run decOM with at least 10GB of memory and 1 core.

Several sinks

You can test if decOM with several sinks by using the files inside test/several_samples/ as follows:

decOM -p_sources decOM_sources/ -p_sinks tests/several_samples/sinks.txt -p_keys tests/several_samples/ -mem 10GB -t 5 

decOMrelies on DASK for parallelization. Once you start running decOMand the client is set, you can see the diagnostic dashboard to follow the process and better tune parameters such as -mem and -t, make sure you can connect to your local host and visualise it here: http://127.0.0.1:8787/status

Output files

decOM will output one .csv file with the k-mer counts and proportions, a folder with the vector representing the sample of interest, from now on called sink (s), and a barplot if indicated by the user.

decOM_output/  
├──decOM_output.csv 
├──result_plot_sinks.pdf 
├──result_plot_sinks.html 
├──{s}_vector/  

The decOM_output.csv file is a dataframe that contains one row per sink. The columns correspond to the raw number of k-mers per source environment, the running time per sink, the sink name and the proportions. The result for the one sample explained before should look like this:

Sediment/Soil Skin aOral mOral Unknown Running time (s) Sink p_Sediment/Soil p_Skin p_aOral p_mOral p_Unknown
182 281 197859 37023 334 196.7268 SRR13355807 0.0772 0.1192 83.9527 15.7091 0.1417

The result_plot_sinks.pdfand result_plot_sinks.htmlare static and interactive plots (respectively) for the proportions of source environments per sink. The {s}_vector/ folder is the output of kmtricks filter + kmtricks aggregate.

Example

You can use as input your fastq/fasta file from your own experiment, you can download an ancient oral sample of interest from the AncientMetagenomeDir or from the SRA.
The users of decOM can represent their own metagenomic sample as a presence/absence vector of k-mers using kmtricks. This sink can be compared against the collection of sources we have put together.

Once you have downloaded the folder with the matrix of sources and the fastq file(s) of your sink(s), you have to create a key.fof file per sink.
The key.fof has one line of text depending on your type of data:

-Single-end:
s : path/to/file/s_1.fastq.gz

-Paired-end :
s : path/to/file/s_1.fastq.gz; path/to/file/s_2.fastq.gz

Note: As decOM relies on kmtricks, you might use a FASTA or FASTQ format, gzipped or not, which means you have to change the key.fof file accordingly.

Since you now have the fasta/fastq file of your sink, the folder with the matrix of sources and the key file, simply run decOM as follows:

Single sink

decOM -s {SINK} -p_sources decOM_sources/ -k {KEY.FOF} -mem {MEMORY} -t {THREADS}

Several sinks

If you want to assess the contamination of several sinks, you need one key.fof file per sink, and they must be inside the folder p_sources

decOM -p_sinks {PATH_SINKS} -p_sources decOM_sources/ -p_keys {PATH_KEYS} -mem {MEMORY} -t {THREADS}

Command line options

usage: decOM [-h] (-s SINK | -p_sinks PATH_SINKS) -p_sources PATH_SOURCES (-k KEY | -p_keys PATH_KEYS) -mem MEMORY -t THREADS [-o OUTPUT]
[-p PLOT] [-V] [-v]

Microbial source tracking for contamination assessment of ancient oral samples using k-mer-based methods


Arguments:

-h, --help  
show this help message and exit

-s SINK, --sink SINK  
Write down the name of your sink. It must be the same as the first element of key.fof. When this argument is set,-k/--key must be defined too

-p_sinks PATH_SINKS, --path_sinks PATH_SINKS 
.txt file with a list of sinks limited by a newline (\n). When this argument is set, -p_keys/--path_keys must be defined too.

-p_sources PATH_SOURCES, --path_sources PATH_SOURCES
Path to folder downloaded from https://zenodo.org/record/6513520/files/decOM_sources.tar.gz

-k KEY, --key KEY 
Filtering key (a kmtricks fof with only one sample). When this argument is set, -s/--sink must be defined too.

-p_keys PATH_KEYS, --path_keys PATH_KEYS
Path to folder with filtering keys (a kmtricks fof with only one sample).You should have as many .fof files as sinks.When this argument is set, -p_sinks/--path_sinks must be defined too.

-mem MEMORY, --memory MEMORY
Write down how much memory you want to use for this process. Ex: 10GB

-t THREADS, --threads THREADS
Number of threads to use. Ex: 5

-o OUTPUT, --output OUTPUT
Path to output folder, where you want decOM to write the results. Folder must not exist, it won't be overwritten.

-p PLOT, --plot PLOT  True if you want a plot (in pdf and html format) with the source proportions of the sink, else False

-V, --version Show version number and exit

-v, --verbose Verbose output

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

decOM-0.0.22.tar.gz (41.6 kB view details)

Uploaded Source

Built Distributions

decOM-0.0.22-py3-none-any.whl (42.9 kB view details)

Uploaded Python 3

decOM-0.0.22-py2.py3-none-any.whl (42.9 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file decOM-0.0.22.tar.gz.

File metadata

  • Download URL: decOM-0.0.22.tar.gz
  • Upload date:
  • Size: 41.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.64.0 importlib-metadata/4.11.4 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.12

File hashes

Hashes for decOM-0.0.22.tar.gz
Algorithm Hash digest
SHA256 e627a388c0391f0068c81f1b33d6f24d3a5fc07358acea3bdd0086f09215e09c
MD5 051033e4d6b86c5bd28bd70e20814a66
BLAKE2b-256 a050e59fccd231e62fa3019d12c827a8b94747dc2a8fa4c4a388bae3c0919f6b

See more details on using hashes here.

File details

Details for the file decOM-0.0.22-py3-none-any.whl.

File metadata

  • Download URL: decOM-0.0.22-py3-none-any.whl
  • Upload date:
  • Size: 42.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.64.0 importlib-metadata/4.11.4 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.12

File hashes

Hashes for decOM-0.0.22-py3-none-any.whl
Algorithm Hash digest
SHA256 634754dc68b8d598f9a930e579bc93de8f2951bb29c959befe2640663eab7e84
MD5 acde4edae572487dd4be1c8732794d68
BLAKE2b-256 50851eee220fbda1850e63388f376af08798266e420d45de69b952bc87776984

See more details on using hashes here.

File details

Details for the file decOM-0.0.22-py2.py3-none-any.whl.

File metadata

  • Download URL: decOM-0.0.22-py2.py3-none-any.whl
  • Upload date:
  • Size: 42.9 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.64.0 importlib-metadata/4.11.4 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.12

File hashes

Hashes for decOM-0.0.22-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 0fc0869dbf301c51e25899997019521e75a92d6f77d49211d9967e4d36b961a8
MD5 d2df5a3fb1bcb76d92d815ab973dd680
BLAKE2b-256 1a00984728136954efad8121f73c3e0a5e0fbca598c7d86a8b6c81851a8eedf5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page