Demultiplex multi-reference BAM files into per-reference buckets and call consensus

Project description

midsplit

Demultiplex a multi-reference BAM file into per-reference buckets and call a consensus sequence for each non-empty bucket.

What it does

When reads are aligned to multiple reference sequences in a single BAM file, midsplit assigns each read (or read pair) to the reference(s) it matches best, then produces a separate BAM, consensus FASTA, and per-site statistics file for each reference that received at least one read.

The classification uses the NM (edit-distance) tag to compute a percent identity for each alignment. A read is assigned to a reference if its percent identity is at least --threshold times the best percent identity that read achieves across all references (default 0.95). Paired-end reads are treated as a unit using an overlap-aware combined percent identity, so both mates always land in the same bucket(s).

Output

For each reference that receives reads, midsplit writes:

File	Contents
`{ID}.bam` / `{ID}.bam.bai`	Sorted, indexed per-reference BAM
`{ID}-consensus.fasta`	Consensus sequence called by ivar
`{ID}-per-site.tsv`	Per-position depth, A/C/G/T counts, ref base, and consensus base
`{ID}-coverage.html`	Interactive coverage plot
`{ID}-coverage.pdf`	Static coverage plot
`summary.txt`	Run-level statistics and consensus-vs-reference comparison

The per-site TSV has columns: site, ref_base, consensus_base, depth, A, C, G, T. When --align is used, the consensus base is mapped back to the correct reference position even when ivar has inserted or deleted bases relative to the reference.

Usage

midsplit [options] INPUT_BAM

Options

Option	Default	Description
`--output-dir DIR`	`.`	Directory for all output files (created if absent)
`--threshold FLOAT`	`0.95`	Minimum fraction of best PID to assign a read
`--reference FASTA`	—	Multi-reference FASTA; enables `ref_base` column and consensus comparison
`--align`	off	Align consensus to reference before comparison (recommended when lengths differ)
`--aligner`	`mafft`	Aligner for `--align`: `mafft`, `needle`, or `edlib`
`--aligner-options OPTIONS`	—	Extra options forwarded to the aligner (implies `--align`)
`--consensus-caller`	`ivar`	Consensus caller to use
`--consensus-quality INT`	`20`	Minimum base quality passed to ivar (`-q`)
`--consensus-frequency-threshold FLOAT`	`0.0`	Minimum frequency for ivar to call a base (`-t`)
`--consensus-low-coverage INT`	`0`	Depth below which ivar masks with N (`-m`)
`--consensus-id TEMPLATE`	—	ID for the consensus sequence; use `{reference}` to embed the reference name
`--title TEMPLATE`	—	Plot title; supports `{mean}`, `{median}`, `{sd}`, `{min}`, `{max}`, `{reference}`
`--xTitle TEXT`	`"Genome position"`	X-axis title for coverage plots
`--yTitle TEXT`	`"Coverage depth"`	Y-axis title for coverage plots
`--titleFontSize INT`	`16`	Font size for plot title
`--axisFontSize INT`	`14`	Font size for axis labels
`--places INT`	`2`	Decimal places for floating-point statistics in plot titles

Example

midsplit \
  --reference references/multi.fasta \
  --output-dir results/ \
  --align \
  --threshold 0.95 \
  alignments/reads-vs-multi.bam

Requirements

Python 3.10+
samtools and ivar on PATH
Python dependencies are managed with uv; run uv sync to install them

Notes

Only primary and secondary alignments are used; supplementary alignments (chimeric/split reads) are skipped.
Reads aligned with Bowtie2 --all or -k N emit non-best hits as secondary alignments; midsplit includes these in classification so that all alignment evidence is used.
For circular genomes (e.g. HBV) mapped against a linearised reference, local alignment (bowtie2 --very-sensitive-local) is strongly recommended over end-to-end alignment. End-to-end mode cannot soft-clip reads that span the linearisation junction, which introduces artefactual bases near position 1 of the reference and can corrupt the consensus at those positions.

Project details

Release history Release notifications | RSS feed

This version

0.3.1

Apr 26, 2026

0.3.0

Apr 26, 2026

0.1.2

Apr 25, 2026

0.1.1

Apr 24, 2026

0.1.0

Apr 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

midsplit-0.3.1.tar.gz (417.6 kB view details)

Uploaded Apr 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

midsplit-0.3.1-py3-none-any.whl (19.6 kB view details)

Uploaded Apr 26, 2026 Python 3

File details

Details for the file midsplit-0.3.1.tar.gz.

File metadata

Download URL: midsplit-0.3.1.tar.gz
Upload date: Apr 26, 2026
Size: 417.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.6

File hashes

Hashes for midsplit-0.3.1.tar.gz
Algorithm	Hash digest
SHA256	`cdcbc55e8a9bae7fc179e932fcefde4b51f21c4d06c31e4d2b7e42b493784512`
MD5	`84ec7f888fe8af594f02550728d871d4`
BLAKE2b-256	`cd43261ebab1e7f5c26b7380a1a0e69ee2ea7b5eef2af6bef0b0c876a23edf90`

See more details on using hashes here.

File details

Details for the file midsplit-0.3.1-py3-none-any.whl.

File metadata

Download URL: midsplit-0.3.1-py3-none-any.whl
Upload date: Apr 26, 2026
Size: 19.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.6

File hashes

Hashes for midsplit-0.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`931d155f0fdaeca16d230f55251e1db34e667bfc0bb2ef40d4983b34b09ac424`
MD5	`edb4e291bb58b5cdda9ba4a5f8adacff`
BLAKE2b-256	`60b75a361e5184e60199bbd830397b24a1995bb2d66facfae73d4e8917374751`

See more details on using hashes here.

midsplit 0.3.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

midsplit

What it does

Output

Usage

Options

Example

Requirements

Notes

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes