Skip to main content

Utility for computing k-mer-based statistics

Project description

kmer2stats

PyPI versionBioconda

kmer2stats computes k-mer based statistics for microbial community analysis, including a wide range of alpha diversity metrics (e.g. Shannon, Chao1, inverse Simpson) and descriptive count statistics (e.g. total_count, unique_kmers, percent_singletons). The output can be used for downstream plotting or comparative analyses.

Installation

All dependencies (pandas>=2.2.2, numpy>=1.26.4, scikit-bio>=0.6.3, argparse>=1.4.0) are automatically installed by both methods below.

With pip

pip install kmer2stats

Or directly from source:

git clone https://github.com/SantaMcCloud/kmer2stats.git
cd kmer2stats
pip install -r requirements.txt

With conda

conda install bioconda::kmer2stats

Usage

kmer2stats count_file

kmer2stats is also available as a Galaxy tool on the usegalaxy.eu server — search for kmer2stats in the tool panel.

Input

The input is a space-separated k-mer count file with two columns (k-mer sequence and count), without a header. This is the format produced by jellyfish dump:

AAAAAC 6870
AAAAAG 6312
AAAAAT 7966
AAAACA 5133
AAAACC 5600
AAAACG 5870
AAAACT 3911
AAAAGA 4173
AAAAGC 5078
AAAAGG 3047
AAAAGT 3067
AAAATA 5726
AAAATC 6167
AAAATG 5731
AAAATT 4987
AAACAA 3719
AAACAC 2817
AAACAG 5565
AAACAT 3342
AAACCA 5011
AAACCC 2469

A test file is provided in test_files/test_file.txt.

Output

The output is a tab-separated (TSV) file (compute_diversity.csv) with two columns: the metric name and its computed value. Diverse metrics are reported:

Richness estimators

Metric Description
observed_features Number of distinct k-mers observed
singles Number of k-mers observed exactly once (singletons)
doubles Number of k-mers observed exactly twice (doubletons)
chao1 Chao1 non-parametric richness estimator
ace Abundance-based coverage estimator (ACE)
margalef Margalef's richness index
menhinick Menhinick's richness index

Diversity indices

Metric Description
shannon Shannon entropy (H') — measures overall diversity
brillouin_d Brillouin index — diversity for fully censused communities
fisher_alpha Fisher's alpha diversity index
hill Hill number (order 1) — effective number of species
renyi Rényi entropy
tsallis Tsallis entropy

Evenness metrics

Metric Description
pielou_e Pielou's evenness (J') — Shannon evenness
heip_e Heip's evenness index
mcintosh_e McIntosh evenness index
simpson_e Simpson's evenness index
enspie ENS_PIE (effective number of species via PIE)

Dominance and concentration metrics

Metric Description
simpson_d Simpson's dominance index (D)
inv_simpson Inverse Simpson index (1/D)
berger_parker_d Berger-Parker dominance index
dominance Dominance index
gini_index Gini coefficient of inequality
mcintosh_d McIntosh dominance index
strong Strong's dominance index
kempton_taylor_q Kempton-Taylor Q statistic

Coverage and completeness

Metric Description
goods_coverage Good's coverage estimator
robbins Robbins' estimator of the probability of unseen species

Descriptive statistics

Metric Description
total_count Sum of all k-mer counts
unique_kmers Number of distinct k-mers
mean_count Mean count per k-mer
median_count Median count per k-mer
max_count Maximum k-mer count
min_count Minimum k-mer count
std_count Standard deviation of counts
count_range Range of counts (max − min)
num_singletons Number of k-mers with count = 1
num_doubletons Number of k-mers with count = 2
percent_singletons Percentage of unique k-mers that are singletons

Citation

If you use kmer2stats, please cite:

Faack, S., & Zierep, P. (2026). kmer2stats (v1.0.3). Zenodo. https://doi.org/10.5281/zenodo.19828576

License

GPL-3.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kmer2stats-1.0.3.tar.gz (17.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kmer2stats-1.0.3-py3-none-any.whl (17.8 kB view details)

Uploaded Python 3

File details

Details for the file kmer2stats-1.0.3.tar.gz.

File metadata

  • Download URL: kmer2stats-1.0.3.tar.gz
  • Upload date:
  • Size: 17.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for kmer2stats-1.0.3.tar.gz
Algorithm Hash digest
SHA256 db1814701f6046d6bf3994f2898957e3ee4541df02ec2a2d2826398f0d786652
MD5 2cee0e8e44f0a2443a8019f647d053fb
BLAKE2b-256 aaa405cedb1ade61df128e51a3d48bde9f67920adc691a069139d7cba61063f5

See more details on using hashes here.

File details

Details for the file kmer2stats-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: kmer2stats-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 17.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for kmer2stats-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 048fb086a45c07d265305d3d83de5e18a000971d909103498c5ade22084a2a4a
MD5 e445fd2cfe518d2fb9bb5d7a353418ba
BLAKE2b-256 cde84fdb5b037e675f4b6764c1ea74555fbd8f810b96012e319af87f4ac298ae

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page