sourmash plugin for repeat-robust mutation rate estimation (r_pp, r_pc, r_cc).
Project description
sourmash-plugin-repeat-robust-mutation-rate-estimators
sourmash is a tool for biological sequence analysis and comparisons.
This plugin implements repeat-robust substitution rate estimators r_pp, r_pc, and r_cc based on FracMinHash sketches, as described in:
Wu, H. and Medvedev, P. (2026). Repeat-robust estimation of substitution rates from k-mer sketches. bioRxiv. https://www.biorxiv.org/content/10.64898/2026.04.01.715966v1
Installation
Install sourmash, then install this plugin:
# Option 1:
conda install -c conda-forge -c bioconda sourmash
pip install sourmash-plugin-repeat-robust-mutation-rate-estimators
# Option 2:
pip install sourmash
pip install sourmash-plugin-repeat-robust-mutation-rate-estimators
Verify the plugin is recognized:
sourmash scripts
You should see sketch and mutation_rate listed under available plugin commands.
Usage
Background
The three estimators treat the two input sequences asymmetrically: we assume string t is mutated from string s.
If unsure which is s and which is t, use the longer sequence as s.
Each estimator requires a specific sketch mode:
| Estimator | s sketch mode | t sketch mode |
|---|---|---|
r_pp |
standard |
standard |
r_pc |
standard |
multiplicity |
r_cc |
extended |
multiplicity |
In general, estimators that use more information achieve higher accuracy.
Step 1: Sketch your sequences
# For r_pp
sourmash scripts sketch s.fa --sketch-mode standard -o s.sig -k 21 --scaled 1000
sourmash scripts sketch t.fa --sketch-mode standard -o t.sig -k 21 --scaled 1000
# For r_pc
sourmash scripts sketch s.fa --sketch-mode standard -o s.sig -k 21 --scaled 1000
sourmash scripts sketch t.fa --sketch-mode multiplicity -o t.sig -k 21 --scaled 1000
# For r_cc
sourmash scripts sketch s.fa --sketch-mode extended -o s.sig -k 21 --scaled 1000
sourmash scripts sketch t.fa --sketch-mode multiplicity -o t.sig -k 21 --scaled 1000
Sketch modes:
standard: stores distinct k-mer hashes and L, where L = |x| - k + 1 is the total number of k-mers in string x. Use as s or t for r_pp.multiplicity: stores k-mer hashes with per-hash counts and L. Use as t for r_pc and r_cc.extended: stores distinct k-mer hashes, L, and a precomputed correction constantsum_occ_h1. Use as s for r_cc. Note: computingsum_occ_h1requires reading the full sequence and may take longer for large genomes.
Step 2: Estimate mutation rate
sourmash scripts mutation_rate --estimator r_pp --s-sig s.sig --t-sig t.sig
sourmash scripts mutation_rate --estimator r_pc --s-sig s.sig --t-sig t.sig
sourmash scripts mutation_rate --estimator r_cc --s-sig s.sig --t-sig t.sig
Example output:
Estimator : r_cc
k : 21
scaled : 1000
L_s : 4800000
Estimated mutation rate : 0.012345
Support
Please file issues at https://github.com/Wu-Haonan/sourmash-plugin-repeat-robust-mutation-rate-estimators/issues
Dev docs
sourmash-plugin-repeat-robust-mutation-rate-estimators is developed at https://github.com/Wu-Haonan/sourmash-plugin-repeat-robust-mutation-rate-estimators.
Citation
If you use this plugin, please cite:
Wu, H. and Medvedev, P. (2026). Repeat-robust estimation of substitution rates
from k-mer sketches. bioRxiv.
https://www.biorxiv.org/content/10.64898/2026.04.01.715966v1
License
MIT License. See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sourmash_plugin_repeat_robust_mutation_rate_estimators-0.1.0.tar.gz.
File metadata
- Download URL: sourmash_plugin_repeat_robust_mutation_rate_estimators-0.1.0.tar.gz
- Upload date:
- Size: 8.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7f0f127d89f5c6e803ff4e475e780af95edcb2b78a58cb586cf2eae93be98c3f
|
|
| MD5 |
c966dc749bac86b595722ca8343c0495
|
|
| BLAKE2b-256 |
d3233848cd2acc306a9c5ad50116d6e8a98174593457940c5424d117f420a77f
|
File details
Details for the file sourmash_plugin_repeat_robust_mutation_rate_estimators-0.1.0-py3-none-any.whl.
File metadata
- Download URL: sourmash_plugin_repeat_robust_mutation_rate_estimators-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
16a31cc4262a619b8506b6a22d08a617f447f6a54db3c187927aabeb089cdcd3
|
|
| MD5 |
1d63c7210a369f931e4c89574c4eadb2
|
|
| BLAKE2b-256 |
007eb2c612560ba5255a8aba4f08f36236b308ea8012ba13713ce5c2ea05c5ca
|