Skip to main content

Indel-aware consensus from aligned BAMs

Project description

Kindel: indel-aware consensus from aligned BAM

JOSS status PyPI version Python support Tests

Kindel reconciles substitutions and CIGAR-described indels to to produce a majority consensus from an aligned BAM/SAM file. Using the --realign option, unaligned gap closure using soft-clipped sequence information is also performed, a kind of local reassembly. Intended for use with small alignments of genes or virus genomes, Kindel is tested with BAMs created by aligners such as Minimap2 and BWA. No reference sequence is required, however the input BAM must contain headers (@SQ) . If you encounter problems, please open an issue. Please also cite the JOSS article if you find this useful.

Core functionality

clip-dominant region

Reassembly of clip-dominant regions (CDRs) with --realign

clip-dominant region

Features

  • Consensus of aligned substititutions, small insertions and deletions

  • Optional consensus reassembly around large unaligned 'clip-dominant' gaps (using --realign)

  • Support for short, paired and long reads mapped with e.g. Minimap2, BWA-MEM, and Segemehl

  • Support for BAMs with multiple reference contigs, chromosomes

  • Visualisation of aligned and clipped sequence depth by site alongside insertions, deletions (kindel plot)

Limitations

  • While Kindel has been tested with bacterial genomes, expect slow performance with megabase genomes
  • SAM/BAM files must contain an @SQ header line with reference sequence length(s).
  • Realignment mode (--realign) is able to close gaps of up to 2x read length

Installation

Install inside existing Python environment:

# Requires Python 3.9+ and Samtools
pip install kindel

Complete installation using a conda-compatible package manager:

conda create -y -n kindel python=3.13 samtools
conda activate kindel
pip install kindel

Development install:

git clone https://github.com/bede/kindel.git
cd kindel
pip install --editable '.[dev]'

Usage (kindel consensus)

Also see usage.ipynb

Command line

Generate a consensus sequence from an aligned BAM, saving the consensus sequence to cns.fa:

$ kindel consensus alignment.bam > cns.fa

Generate a consensus sequence from an aligned BAM with realignment mode enabled, allowing closure of gaps in the consensus sequence:

$ kindel consensus --realign alignment.bam > cns.fa

Built in help:

$ kindel -h
usage: kindel [-h] {consensus,weights,features,variants,plot} ...

positional arguments:
  {consensus,weights,features,variants,plot,version}
    consensus           Infer consensus sequence(s) from alignment in SAM/BAM
                        format
    weights             Returns table of per-site nucleotide frequencies and
                        coverage
    features            Returns table of per-site nucleotide frequencies and
                        coverage including indels
    variants            Output variants exceeding specified absolute and
                        relative frequency thresholds
    plot                Plot sitewise soft clipping frequency across reference
                        and genome
    version             Show version

optional arguments:
  -h, --help            show this help message and exit
$  kindel consensus -h
usage: kindel consensus [-h] [-r] [--min-depth MIN_DEPTH]
                        [--min-overlap MIN_OVERLAP] [-c CLIP_DECAY_THRESHOLD]
                        [--mask-ends MASK_ENDS] [-t] [-u]
                        bam_path

Infer consensus sequence(s) from alignment in SAM/BAM format

positional arguments:
  bam_path              path to SAM/BAM file

optional arguments:
  -h, --help            show this help message and exit
  -r, --realign         attempt to reconstruct reference around soft-clip
                        boundaries (default: False)
  --min-depth MIN_DEPTH
                        substitute Ns at coverage depths beneath this value
                        (default: 1)
  --min-overlap MIN_OVERLAP
                        match length required to close soft-clipped gaps
                        (default: 7)
  -c CLIP_DECAY_THRESHOLD, --clip-decay-threshold CLIP_DECAY_THRESHOLD
                        read depth fraction at which to cease clip extension
                        (default: 0.1)
  --mask-ends MASK_ENDS
                        ignore clip dominant positions within n positions of
                        termini (default: 50)
  -t, --trim-ends       trim ambiguous nucleotides (Ns) from sequence ends
                        (default: False)
  -u, --uppercase       close gaps using uppercase alphabet (default: False)

Python API

from kindel import kindel

kindel.bam_to_consensus(bam_path, realign=False, min_depth=2, min_overlap=7,
                        clip_decay_threshold=0.1, trim_ends=False, uppercase=False)

Issues

If you encounter problems please open a GitHub issue, preferably including a BAM that allows the problem to be reproduced, or else reach out via email or social media.

Visualising alignments (kindel plot)

It can be useful to visualise rates of insertion, deletion and alignment clipping across an alignment. kindel plot generates an interactive HTML plot showing relevant alignment information.

To plot aligned depth alongside insertion, deletion and soft clipping frequency:

kindel plot tests/data_minimap2/2.issue23.debug.bam

Original alignment Plot of original alignment

After alignment to Kindel consensus sequence Plot after alignment to Kindel consensus sequence

Contributing

If you would like to contribute to this project, please open an issue or contact the author directly using the details above. Please note that this project is released with a Contributor Code of Conduct, and by participating in this project you agree to abide by its terms.

Before opening a pull request, please:

  • Ensure tests pass in a local development build (see installation instructions) by executing pytest inside the package directory.
  • Increment the version number inside __init__.py according to SemVer.
  • Update documentation and/or tests if possible.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kindel-1.2.1.tar.gz (22.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kindel-1.2.1-py3-none-any.whl (23.8 kB view details)

Uploaded Python 3

File details

Details for the file kindel-1.2.1.tar.gz.

File metadata

  • Download URL: kindel-1.2.1.tar.gz
  • Upload date:
  • Size: 22.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.32.4

File hashes

Hashes for kindel-1.2.1.tar.gz
Algorithm Hash digest
SHA256 2c392fe6a3eb8bddcfe2d81cc501aaa9338223ef8426f6d5529d60b9117c87f7
MD5 b6cb891593131035ef5fa4296929f1af
BLAKE2b-256 ed0725baa8d439c1da19df60963071369aa7c7ba80fcdd8eb96c85d7cb1c6ada

See more details on using hashes here.

File details

Details for the file kindel-1.2.1-py3-none-any.whl.

File metadata

  • Download URL: kindel-1.2.1-py3-none-any.whl
  • Upload date:
  • Size: 23.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.32.4

File hashes

Hashes for kindel-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 579333940a6cc3bb7b77f76d2461a990123e1d58c632193cba02c225ccca8b89
MD5 c386c6a41d53cc656d11b66a7ff27905
BLAKE2b-256 e84e3228f2ab0736c93ed947ce3d606a8ace026c119e06f7b85ded378b47f1e4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page