Demultiplex multi-reference BAM files into per-reference buckets and call consensus
Project description
midsplit
Demultiplex a multi-reference BAM file into per-reference buckets and call a consensus sequence for each non-empty bucket.
What it does
When reads are aligned to multiple reference sequences in a single BAM file, midsplit assigns each read (or read pair) to the reference(s) it matches best, then produces a separate BAM, consensus FASTA, and per-site statistics file for each reference that received at least one read.
The classification uses the NM (edit-distance) tag to compute a percent
identity for each alignment. A read is assigned to a reference if its percent
identity is at least --threshold times the best percent identity that read
achieves across all references (default 0.95). Paired-end reads are treated as
a unit using an overlap-aware combined percent identity, so both mates always
land in the same bucket(s).
Output
For each reference that receives reads, midsplit writes:
| File | Contents |
|---|---|
{ID}.bam / {ID}.bam.bai |
Sorted, indexed per-reference BAM |
{ID}-consensus.fasta |
Consensus sequence called by ivar |
{ID}-per-site.tsv |
Per-position depth, A/C/G/T counts, ref base, and consensus base |
summary.txt |
Run-level statistics and consensus-vs-reference comparison |
The per-site TSV has columns: site, ref_base, consensus_base, depth,
A, C, G, T. When --align is used, the consensus base is mapped back
to the correct reference position even when ivar has inserted or deleted bases
relative to the reference.
Usage
midsplit [options] INPUT_BAM
Options
| Option | Default | Description |
|---|---|---|
--output-dir DIR |
. |
Directory for all output files (created if absent) |
--threshold FLOAT |
0.95 |
Minimum fraction of best PID to assign a read |
--reference FASTA |
— | Multi-reference FASTA; enables ref_base column and consensus comparison |
--align |
off | Align consensus to reference before comparison (recommended when lengths differ) |
--aligner |
mafft |
Aligner for --align: mafft, needle, or edlib |
--aligner-options OPTIONS |
— | Extra options forwarded to the aligner (implies --align) |
--consensus-quality INT |
20 |
Minimum base quality passed to ivar (-q) |
--consensus-frequency-threshold FLOAT |
0.0 |
Minimum frequency for ivar to call a base (-t) |
--consensus-low-coverage INT |
0 |
Depth below which ivar masks with N (-m) |
--consensus-id TEMPLATE |
— | ID for the consensus sequence; use {ID} to embed the reference name |
Example
midsplit \
--reference references/multi.fasta \
--output-dir results/ \
--align \
--threshold 0.95 \
alignments/reads-vs-multi.bam
Requirements
- Python 3.11+
- samtools and ivar on
PATH - Python dependencies are managed with uv; run
uv syncto install them
Notes
- Only primary and secondary alignments are used; supplementary alignments (chimeric/split reads) are skipped.
- Reads aligned with Bowtie2
--allor-k Nemit non-best hits as secondary alignments; midsplit includes these in classification so that all alignment evidence is used. - For circular genomes (e.g. HBV) mapped against a linearised reference,
local alignment (
bowtie2 --very-sensitive-local) is strongly recommended over end-to-end alignment. End-to-end mode cannot soft-clip reads that span the linearisation junction, which introduces artefactual bases near position 1 of the reference and can corrupt the consensus at those positions.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file midsplit-0.1.1.tar.gz.
File metadata
- Download URL: midsplit-0.1.1.tar.gz
- Upload date:
- Size: 212.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4077050c470ee21b75a9e7edaa71bfcca565b328a740299c06c40c530896cc28
|
|
| MD5 |
014fbef02bdaa2b0cdbbc14d436cb252
|
|
| BLAKE2b-256 |
3c16a8003458a90d96269998bb1aebb8b727a1745de104a2cc7812c17ea024f4
|
File details
Details for the file midsplit-0.1.1-py3-none-any.whl.
File metadata
- Download URL: midsplit-0.1.1-py3-none-any.whl
- Upload date:
- Size: 17.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3709150c0d67a648d3ff5fe22804d945c49e98fdc3ab539de5402f0333321bed
|
|
| MD5 |
4e83c9bedffb9d507a88a8462e3ace2b
|
|
| BLAKE2b-256 |
8386ec4fc7556df0e5772e8f20d903b182df285b16485457a07246d4c0bbed6d
|