Skip to main content

ESKRIM: EStimate with K-mers the RIchness in a Microbiome

Project description

ESKRIM: EStimate with K-mers the RIchness in a Microbiome

ESKRIM is a reference-free tool that compares microbial richness in shotgun metagenomic samples by counting k-mers

Installation via pip

pip install eskrim

Usage

Basic usage

In this example, k-mer richness in a sample (sample1) consisting in two paired-end runs (run1 and run2) is computed.
Forward fastq files are taken as input. Results are saved in the file sample1.eskrim_stats.tsv

eskrim -i sample1.run1_1.fastq.gz sample1.run2_1.fastq.gz -n sample1 -s sample1.eskrim_stats.tsv

Quality control (adapters removal, read trimming) and contaminant removal (reads from the host genome) should be performed before using ESKRIM.

Run ESKRIM similarly for each sample to be compared. All TSV output files can be merged manually.

Advanced usage

Adjusting target read count for subsampling

Depending on the sequencing depth, the target number of reads to randomly draw from each sample (default = 10M) can be adjusted with the -r parameter.

eskrim -i sample1.run1_1.fastq.gz sample1.run2_1.fastq.gz -n sample1 -s sample1.eskrim_stats.tsv -r 5000000

Adjusting read length

All reads are trimmed to a given length (default = 80) because read length can vary between samples.
This length can be changed with the -l parameter.

eskrim -i sample1.run1_1.fastq.gz sample1.run2_1.fastq.gz -n sample1 -s sample1.eskrim_stats.tsv -l 100

Reproducibility

ESKRIM ensures reproducibility when using the same random number generator seed (default = 0).
To make read subsampling vary across executions, the parameters --seed can be used.

eskrim -i sample1.run1_1.fastq.gz sample1.run2_1.fastq.gz -n sample1 -s sample1.eskrim_stats.tsv --seed 1234

Interpreting the output file

ESKRIM saves the results in a TSV file consisting in several columns (-s parameter).

  • sample_name : sample name specified with -n parameter.
  • total_num_reads : number of reads in the sample before subsampling.
  • num_Ns_reads_ignored : number of reads with undetermined bases that were discarded.
  • num_too_short_reads_ignored : number of reads with undetermined bases that were discarded.
  • target_num_reads : target number of reads to draw during the subsampling step.
  • num_selected_reads : number of reads actually drawn after subsampling.
  • read_length : length at which reads were trimmed (-l parameter).
  • kmer_length : length of counted k-mers (-k parameter).
  • num_distinct_kmers : number of distinct kmers in subsampled reads.
  • num_solid_kmers : number of kmers seen at least twice.
  • num_mercy_kmers : number of non-solid kmers occuring in a read where all k-mers are not solid.
    From our experience, the sum 'num_solid_kmers + num_mercy_kmers' is an accurate proxy to compare microbial richness between samples.

WARNING: Do not consider results when num_selected_reads is strictly lower than target_num_reads.
In this case, ignore the samples concerned or decrease the number of reads to be drawn randomly (-r parameter).

Authors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eskrim-1.0.9.tar.gz (18.2 kB view details)

Uploaded Source

Built Distribution

eskrim-1.0.9-py3-none-any.whl (19.0 kB view details)

Uploaded Python 3

File details

Details for the file eskrim-1.0.9.tar.gz.

File metadata

  • Download URL: eskrim-1.0.9.tar.gz
  • Upload date:
  • Size: 18.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.9.18 Linux/5.14.0-427.24.1.el9_4.x86_64

File hashes

Hashes for eskrim-1.0.9.tar.gz
Algorithm Hash digest
SHA256 de2888bc281f4d2e7e66e1219cce0c1a193f82873ecb35e47e8882a31331b5dc
MD5 e37972af4b4fab196fcf0b174d3820fe
BLAKE2b-256 6981c568b0d50286bea1ffd43fdf14f0cee674c168a046e7981476d0e1522112

See more details on using hashes here.

File details

Details for the file eskrim-1.0.9-py3-none-any.whl.

File metadata

  • Download URL: eskrim-1.0.9-py3-none-any.whl
  • Upload date:
  • Size: 19.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.9.18 Linux/5.14.0-427.24.1.el9_4.x86_64

File hashes

Hashes for eskrim-1.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 f8ece1ca3829614d71721518a967cd14e7d94b9197e107c46ca4f56c898e9208
MD5 f6f7fe3426b4b656ff4d1793dafeeac6
BLAKE2b-256 ba24120c660312f1a49aa0376a70107181983097261a5277b565a5b9ed5291c0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page