Utility for computing k-mer-based statistics
Project description
kmer2stats
kmer2stats computes k-mer based statistics for microbial community analysis, including a wide range of alpha diversity metrics (e.g. Shannon, Chao1, inverse Simpson) and descriptive count statistics (e.g. total_count, unique_kmers, percent_singletons). The output can be used for downstream plotting or comparative analyses.
Installation
All dependencies (pandas>=2.2.2, numpy>=1.26.4, scikit-bio>=0.6.3, argparse>=1.4.0) are automatically installed by both methods below.
With pip
pip install kmer2stats
Or directly from source:
git clone https://github.com/SantaMcCloud/kmer2stats.git
cd kmer2stats
pip install -r requirements.txt
With conda
conda install bioconda::kmer2stats
Usage
kmer2stats count_file
kmer2stats is also available as a Galaxy tool on the usegalaxy.eu server — search for kmer2stats in the tool panel.
Input
The input is a space-separated k-mer count file with two columns (k-mer sequence and count), without a header. This is the format produced by jellyfish dump:
AAAAAC 6870
AAAAAG 6312
AAAAAT 7966
AAAACA 5133
AAAACC 5600
AAAACG 5870
AAAACT 3911
AAAAGA 4173
AAAAGC 5078
AAAAGG 3047
AAAAGT 3067
AAAATA 5726
AAAATC 6167
AAAATG 5731
AAAATT 4987
AAACAA 3719
AAACAC 2817
AAACAG 5565
AAACAT 3342
AAACCA 5011
AAACCC 2469
A test file is provided in test_files/test_file.txt.
Output
The output is a tab-separated (TSV) file (compute_diversity.csv) with two columns: the metric name and its computed value. Diverse metrics are reported:
Richness estimators
| Metric | Description |
|---|---|
observed_features |
Number of distinct k-mers observed |
singles |
Number of k-mers observed exactly once (singletons) |
doubles |
Number of k-mers observed exactly twice (doubletons) |
chao1 |
Chao1 non-parametric richness estimator |
ace |
Abundance-based coverage estimator (ACE) |
margalef |
Margalef's richness index |
menhinick |
Menhinick's richness index |
Diversity indices
| Metric | Description |
|---|---|
shannon |
Shannon entropy (H') — measures overall diversity |
brillouin_d |
Brillouin index — diversity for fully censused communities |
fisher_alpha |
Fisher's alpha diversity index |
hill |
Hill number (order 1) — effective number of species |
renyi |
Rényi entropy |
tsallis |
Tsallis entropy |
Evenness metrics
| Metric | Description |
|---|---|
pielou_e |
Pielou's evenness (J') — Shannon evenness |
heip_e |
Heip's evenness index |
mcintosh_e |
McIntosh evenness index |
simpson_e |
Simpson's evenness index |
enspie |
ENS_PIE (effective number of species via PIE) |
Dominance and concentration metrics
| Metric | Description |
|---|---|
simpson_d |
Simpson's dominance index (D) |
inv_simpson |
Inverse Simpson index (1/D) |
berger_parker_d |
Berger-Parker dominance index |
dominance |
Dominance index |
gini_index |
Gini coefficient of inequality |
mcintosh_d |
McIntosh dominance index |
strong |
Strong's dominance index |
kempton_taylor_q |
Kempton-Taylor Q statistic |
Coverage and completeness
| Metric | Description |
|---|---|
goods_coverage |
Good's coverage estimator |
robbins |
Robbins' estimator of the probability of unseen species |
Descriptive statistics
| Metric | Description |
|---|---|
total_count |
Sum of all k-mer counts |
unique_kmers |
Number of distinct k-mers |
mean_count |
Mean count per k-mer |
median_count |
Median count per k-mer |
max_count |
Maximum k-mer count |
min_count |
Minimum k-mer count |
std_count |
Standard deviation of counts |
count_range |
Range of counts (max − min) |
num_singletons |
Number of k-mers with count = 1 |
num_doubletons |
Number of k-mers with count = 2 |
percent_singletons |
Percentage of unique k-mers that are singletons |
Citation
If you use kmer2stats, please cite:
Faack, S., & Zierep, P. (2026). kmer2stats (v1.0.3). Zenodo. https://doi.org/10.5281/zenodo.19828576
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kmer2stats-1.0.3.tar.gz.
File metadata
- Download URL: kmer2stats-1.0.3.tar.gz
- Upload date:
- Size: 17.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
db1814701f6046d6bf3994f2898957e3ee4541df02ec2a2d2826398f0d786652
|
|
| MD5 |
2cee0e8e44f0a2443a8019f647d053fb
|
|
| BLAKE2b-256 |
aaa405cedb1ade61df128e51a3d48bde9f67920adc691a069139d7cba61063f5
|
File details
Details for the file kmer2stats-1.0.3-py3-none-any.whl.
File metadata
- Download URL: kmer2stats-1.0.3-py3-none-any.whl
- Upload date:
- Size: 17.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
048fb086a45c07d265305d3d83de5e18a000971d909103498c5ade22084a2a4a
|
|
| MD5 |
e445fd2cfe518d2fb9bb5d7a353418ba
|
|
| BLAKE2b-256 |
cde84fdb5b037e675f4b6764c1ea74555fbd8f810b96012e319af87f4ac298ae
|