Skip to main content

Fast detection of recombinant reads in BAMs

Project description

readcomb - fast detection of recombinant reads in BAMs

PyPI version

readcomb is a collection of command line and Python tools for fast detection of recombination events in pooled high-throughput sequencing data. readcomb searches for changes in parental haplotype phase across individual reads and classifies recombination events based on various properties of the observed recombinant haplotypes.

readcomb was designed for use with the model alga Chlamydomonas reinhardtii and currently only supports haploids. Although the means of specifically detecting gene conversion are more specific to C. reinhardtii, everything else in readcomb is generalizable to the detection of recombination events in any haploid species.

Installation

pip install readcomb

Dependencies

  • cyvcf2 - Fast retrieval and filtering of VCF files and VCF objects written in C
  • pysam - Interface for SAM and BAM files and provides SAM and BAM objects
  • pandas - Support for data tables
  • tqdm - Provides updating progress bars for command line programs
  • samtools - Used for preprocessing of VCF and BAM files

Usage:

bamprep

Command line preprocessing script for BAM files. bamprep will prepare an index file, filter out unusuable reads, and output a BAM sorted by read name. readcomb requires BAMs sorted by read name for fast parsing and filtering.

readcomb-bamprep --bam [bam_filepath] --out [outdir]

Optional parameters:

  • --samtools - Path to samtools binary
  • --threads [int] - Number of threads samtools should use (default 1)
  • --index_csi - Create CSI index instead of BAI
  • --no_progress - Disable index creation - this will speed up bamprep but will mean no progress bars when filtering

vcfprep

Command line preprocessing script for VCF files

readcomb-vcfprep --vcf [vcf_filepath] --out [output_filepath]

Optional arguments

  • --snps_only - Keep only SNPs
  • --indels_only - Keep only indels
  • --no_hets - Remove heterozygote calls
  • --min_GQ [int] - Minimum genotype quality at both sites (default 30)

filter

Command line multiprocessing script for identification of bam sequences with phase changes

readcomb-filter --bam [bam_filepath] --vcf [vcf_filepath]

Optional arguments:

  • -p, --processes [processes], Number of processes available for filter (default 4)
  • -m, --mode [phase_change|no_match], Filtering mode (default phase_change)
  • -l, --log [log_filepath], Filename for log metric output
  • -o, --out [output_filepath], File to write filtered output to (default recomb_diagnosis)

classification

Python module for detailed classification of sequences containing phase changes

>>> import readcomb.classification as rc
>>> from cyvcf2 import VCF

>>> bam_filepath = 'data/example_sequences.bam'
>>> vcf_filepath = 'data/example_variants.vcf.gz'
>>> pairs = rc.pairs_creation(bam_filepath, vcf_filepath)     # generate list of Pair objects
>>> cyvcf_object = VCF(vcf_filepath)                          # cyvcf2 file object

>>> print(pairs[0])
Record name: chromosome_1-199370 
Read1: chromosome_1:499417-499667 
Read2: chromosome_1:499766-500016 
VCF: data/example_variants.vcf.gz

>>> pairs[0].classify(cyvcf_object)                           # run classification algorithm
>>> print(pairs[0])
Record name: chromosome_1-199370 
Read1: chromosome_1:499417-499667 
Read2: chromosome_1:499766-500016 
VCF: data/example_variants.vcf.gz
Unmatched Variant(s): False 
Condensed: [['CC2936', 499417, 499626], ['CC2935', 499626, 499736], ['CC2936', 499736, 500016]] 
Call: gene_conversion 
Condensed Masked: [['CC2936', 499487, 499626], ['CC2935', 499626, 499736], ['CC2936', 499736, 499946]] 
Call Masked: gene_conversion 

License

GNU General Public License v3 (GPLv3+)

Development

Currently in alpha

Source code

Development repo

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

readcomb-0.4.13.tar.gz (41.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

readcomb-0.4.13-py3-none-any.whl (55.0 kB view details)

Uploaded Python 3

File details

Details for the file readcomb-0.4.13.tar.gz.

File metadata

  • Download URL: readcomb-0.4.13.tar.gz
  • Upload date:
  • Size: 41.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.6.9

File hashes

Hashes for readcomb-0.4.13.tar.gz
Algorithm Hash digest
SHA256 9fa825663cda1c27dcc8296eb10c5593e29e86b0c79307f7916c7ad78d6f1fc3
MD5 39010d32468a05df973ba5c8c3481241
BLAKE2b-256 3a88df82229087d3b2b16e7149c31e15d2239ac4273ecf7261b4cef1c8264507

See more details on using hashes here.

File details

Details for the file readcomb-0.4.13-py3-none-any.whl.

File metadata

  • Download URL: readcomb-0.4.13-py3-none-any.whl
  • Upload date:
  • Size: 55.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.6.9

File hashes

Hashes for readcomb-0.4.13-py3-none-any.whl
Algorithm Hash digest
SHA256 03d61e3a9be7c2bee33fb0881183c3f55d0ef9eb8d800bdb102149f1d0cc9a44
MD5 631d01c452e08eb494729beed586fd77
BLAKE2b-256 6e12bbc08d12092c01def7e5919aa04846c043d8abe3cc92ca28f4baba40f3aa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page