Ultra-fast strand-aware mutation counter
Project description
CountMut
Ultra-fast strand-aware mutation counter for bisulfite sequencing analysis
CountMut is a high-performance tool for counting mutations from bisulfite sequencing BAM files (BS-seq, CAM-seq, GLORI-seq, eTAM-seq). It features parallel processing, quality-based mate overlap deduplication, and optimized file I/O for maximum speed.
Features
- 🚀 Ultra-Fast: Call mutation without pileup reads
- 🧬 Bisulfite Support: NS, Zf, Yf tag filtering for conversion analysis
- 🎯 Accurate: Quality-based mate overlap deduplication prevents double-counting
- ⚡ Parallel: Multi-threaded genomic window processing
- 🔧 Flexible: Configurable filtering, strand-specific processing, auto-indexing
Installation
pip install countmut
Quick Start
# Basic usage - auto-creates indices if needed
countmut -i input.bam -r reference.fa -o mutations.tsv
# Count T→C mutations (common in bisulfite sequencing)
countmut -i input.bam -r reference.fa -o mutations.tsv --ref-base T --mut-base C
# With custom threads and filtering
countmut -i input.bam -r reference.fa -o mutations.tsv -t 8 --max-unc 5 --min-con 2
Options
Input/Output
-i, --input PATH Input BAM file (coordinate-sorted) [required]
-r, --reference PATH Reference FASTA file [required]
-o, --output PATH Output TSV file (default: stdout)
-f, --force Overwrite output without prompting
Mutation Analysis
--ref-base TEXT Reference base to count from [default: A]
--mut-base TEXT Mutation base to count [default: G]
--strand TEXT Strand: both/forward/reverse [default: both]
--region TEXT Genomic region (e.g., 'chr1:1000000-2000000')
Performance
-t, --threads INTEGER Number of parallel threads [default: auto]
-b, --bin-size INTEGER Genomic bin size in bp [default: 10000]
Alternative Mutation Tagging
--ref-base2 TEXT Alternative reference base for tagging (e.g., 'C')
--mut-base2 TEXT Alternative mutation base for tagging (e.g., 'T')
--output-bam PATH Output BAM with alternative tags (Yc, Zc)
Quality Filters
--min-baseq INTEGER Min base quality (Phred score) [default: 20]
--min-mapq INTEGER Min mapping quality (MAPQ) [default: 0]
--max-sub INTEGER Max substitutions (NS tag) [default: 1]
--trim-start INTEGER Trim N bases from read 5' end (fragment orientation) [default: 2]
--trim-end INTEGER Trim N bases from read 3' end (fragment orientation) [default: 2]
--max-unc INTEGER Max unconverted (Zf tag) [default: 3]
--min-con INTEGER Min converted (Yf tag) [default: 1]
Output Records
-p, --pad INTEGER Motif window half-size [default: 15]
-s, --save-rest Include other bases (o0, o1, o2 columns)
Note: BAM files must have NS, Zf, and Yf tags (essential for bisulfite analysis). Indices (.bai, .fai) are created automatically if missing.
Output Format
TSV file with the following columns:
| Column | Description |
|---|---|
chrom |
Chromosome name |
pos |
Genomic position (1-based) |
strand |
Strand (+ or -) |
motif |
Sequence context (2×pad+1 bp window) |
u0, u1, u2 |
Unconverted (reference base) counts |
m0, m1, m2 |
Mutation (mutation base only) counts |
o0, o1, o2 |
Other bases counts (with --save-rest) |
Count categories (x0, x1, x2):
- x0 (low quality): Bases failing quality filters (trim region, max-sub, min-mapq, min-baseq)
- x1 (insufficient conversion): Bases from reads with insufficient conversion efficiency (high Zf or low Yf)
- x2 (high conversion): Bases from reads with high conversion efficiency (low Zf and high Yf)
Example Output
Without --save-rest:
chrom |
pos |
strand |
motif |
u0 |
u1 |
u2 |
m0 |
m1 |
m2 |
|---|---|---|---|---|---|---|---|---|---|
chr1 |
10000 |
+ |
AAG |
10 |
5 |
2 |
0 |
0 |
0 |
chr1 |
10001 |
- |
TTC |
10 |
5 |
2 |
0 |
0 |
0 |
With --save-rest:
chrom |
pos |
strand |
motif |
u0 |
u1 |
u2 |
m0 |
m1 |
m2 |
o0 |
o1 |
o2 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
chr1 |
10000 |
+ |
AAG |
10 |
5 |
2 |
0 |
0 |
0 |
1 |
2 |
3 |
chr1 |
10001 |
- |
TTC |
10 |
5 |
2 |
0 |
0 |
0 |
1 |
2 |
3 |
Copyright © 2025-present Chang Y
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file countmut-0.0.8.tar.gz.
File metadata
- Download URL: countmut-0.0.8.tar.gz
- Upload date:
- Size: 30.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
85551700a653330158a29317ab40ec55df3e3f1ac15c92b14b02211ef5f252ba
|
|
| MD5 |
0a6a3301801ad5666893644a64e163f5
|
|
| BLAKE2b-256 |
0690a576a14bf34913d9517a89f361f1ce8655100821c1ec5342caff4d962d74
|
Provenance
The following attestation bundles were made for countmut-0.0.8.tar.gz:
Publisher:
publish.yml on y9c/countmut
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
countmut-0.0.8.tar.gz -
Subject digest:
85551700a653330158a29317ab40ec55df3e3f1ac15c92b14b02211ef5f252ba - Sigstore transparency entry: 702407703
- Sigstore integration time:
-
Permalink:
y9c/countmut@3d5548ae852696e86f737ec09a57fb693b5165a1 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/y9c
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@3d5548ae852696e86f737ec09a57fb693b5165a1 -
Trigger Event:
push
-
Statement type:
File details
Details for the file countmut-0.0.8-py3-none-any.whl.
File metadata
- Download URL: countmut-0.0.8-py3-none-any.whl
- Upload date:
- Size: 26.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
546cbc1a42fb685004d412570a77263c9e711bed34e9c843c05387d7334ac79e
|
|
| MD5 |
7f49bc5412ee506c1a4e7cf75d5a6239
|
|
| BLAKE2b-256 |
d303cdce83b4dfba9f406e835f862fcf0c0cc1a7cf03f3592395b22c7569855d
|
Provenance
The following attestation bundles were made for countmut-0.0.8-py3-none-any.whl:
Publisher:
publish.yml on y9c/countmut
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
countmut-0.0.8-py3-none-any.whl -
Subject digest:
546cbc1a42fb685004d412570a77263c9e711bed34e9c843c05387d7334ac79e - Sigstore transparency entry: 702407710
- Sigstore integration time:
-
Permalink:
y9c/countmut@3d5548ae852696e86f737ec09a57fb693b5165a1 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/y9c
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@3d5548ae852696e86f737ec09a57fb693b5165a1 -
Trigger Event:
push
-
Statement type: