ESKRIM: EStimate with K-mers the RIchness in a Microbiome
Project description
ESKRIM: EStimate with K-mers the RIchness in a Microbiome
ESKRIM is a reference-free tool that compares microbial richness in shotgun metagenomic samples by counting k-mers
Installation via pip
pip install eskrim
Usage
Basic usage
In this example, k-mer richness in a sample (sample1) consisting in two paired-end runs (run1 and run2) is computed.
Forward fastq files are taken as input. Results are saved in the file sample1.eskrim_stats.tsv
eskrim -i sample1.run1_1.fastq.gz sample1.run2_1.fastq.gz -n sample1 -s sample1.eskrim_stats.tsv
Quality control (adapters removal, read trimming) and contaminant removal (reads from the host genome) should be performed before using ESKRIM.
Run ESKRIM similarly for each sample to be compared. All TSV output files can be merged manually.
Advanced usage
Adjusting target read count for subsampling
Depending on the sequencing depth, the target number of reads to randomly draw from each sample (default = 10M) can be adjusted with the -r parameter.
eskrim -i sample1.run1_1.fastq.gz sample1.run2_1.fastq.gz -n sample1 -s sample1.eskrim_stats.tsv -r 5000000
Adjusting read length
All reads are trimmed to a given length (default = 80) because read length can vary between samples.
This length can be changed with the -l parameter.
eskrim -i sample1.run1_1.fastq.gz sample1.run2_1.fastq.gz -n sample1 -s sample1.eskrim_stats.tsv -l 100
Reproducibility
ESKRIM ensures reproducibility when using the same random number generator seed (default = 0).
To make read subsampling vary across executions, the parameters --seed can be used.
eskrim -i sample1.run1_1.fastq.gz sample1.run2_1.fastq.gz -n sample1 -s sample1.eskrim_stats.tsv --seed 1234
Interpreting the output file
ESKRIM saves the results in a TSV file consisting in several columns (-s parameter).
- sample_name : sample name specified with -n parameter.
- total_num_reads : number of reads in the sample before subsampling.
- num_Ns_reads_ignored : number of reads with undetermined bases that were discarded.
- num_too_short_reads_ignored : number of reads with undetermined bases that were discarded.
- target_num_reads : target number of reads to draw during the subsampling step.
- num_selected_reads : number of reads actually drawn after subsampling.
- read_length : length at which reads were trimmed (-l parameter).
- kmer_length : length of counted k-mers (-k parameter).
- num_distinct_kmers : number of distinct kmers in subsampled reads.
- num_solid_kmers : number of kmers seen at least twice.
- num_mercy_kmers : number of non-solid kmers occuring in a read where all k-mers are not solid.
From our experience, the sum 'num_solid_kmers + num_mercy_kmers' is an accurate proxy to compare microbial richness between samples.
WARNING: Do not consider results when num_selected_reads is strictly lower than target_num_reads.
In this case, ignore the samples concerned or decrease the number of reads to be drawn randomly (-r parameter).
Authors
- Florian Plaza Oñate: florian.plaza-onate@inrae.fr
- Emmanuelle Le Chatelier: emmanuelle.le-chatelier@inrae.fr
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file eskrim-1.0.9.tar.gz
.
File metadata
- Download URL: eskrim-1.0.9.tar.gz
- Upload date:
- Size: 18.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.9.18 Linux/5.14.0-427.24.1.el9_4.x86_64
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | de2888bc281f4d2e7e66e1219cce0c1a193f82873ecb35e47e8882a31331b5dc |
|
MD5 | e37972af4b4fab196fcf0b174d3820fe |
|
BLAKE2b-256 | 6981c568b0d50286bea1ffd43fdf14f0cee674c168a046e7981476d0e1522112 |
File details
Details for the file eskrim-1.0.9-py3-none-any.whl
.
File metadata
- Download URL: eskrim-1.0.9-py3-none-any.whl
- Upload date:
- Size: 19.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.9.18 Linux/5.14.0-427.24.1.el9_4.x86_64
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f8ece1ca3829614d71721518a967cd14e7d94b9197e107c46ca4f56c898e9208 |
|
MD5 | f6f7fe3426b4b656ff4d1793dafeeac6 |
|
BLAKE2b-256 | ba24120c660312f1a49aa0376a70107181983097261a5277b565a5b9ed5291c0 |