Skip to main content

Fast detection of recombinant reads in BAMs

Project description

readcomb - fast detection of recombinant reads in BAMs

PyPI version

readcomb is a collection of command line and Python tools for fast detection of recombination events in pooled high-throughput sequencing data. readcomb searches for changes in parental haplotype phase across individual reads and classifies recombination events based on various properties of the observed recombinant haplotypes.

readcomb was designed for use with the model alga Chlamydomonas reinhardtii and currently only supports haploids. Although the means of specifically detecting gene conversion are more specific to C. reinhardtii, everything else in readcomb is generalizable to the detection of recombination events in any haploid species.

Installation

pip install readcomb

Dependencies

  • cyvcf2 - Fast retrieval and filtering of VCF files and VCF objects written in C
  • pysam - Interface for SAM and BAM files and provides SAM and BAM objects
  • pandas - Support for data tables
  • tqdm - Provides updating progress bars for command line programs
  • samtools - Used for preprocessing of VCF and BAM files

Usage:

bamprep

Command line preprocessing script for BAM files. bamprep will prepare an index file, filter out unusuable reads, and output a BAM sorted by read name. readcomb requires BAMs sorted by read name for fast parsing and filtering.

readcomb-bamprep --bam [bam_filepath] --out [outdir]

Optional parameters:

  • --samtools - Path to samtools binary
  • --threads [int] - Number of threads samtools should use (default 1)
  • --index_csi - Create CSI index instead of BAI
  • --no_progress - Disable index creation - this will speed up bamprep but will mean no progress bars when filtering

vcfprep

Command line preprocessing script for VCF files

readcomb-vcfprep --vcf [vcf_filepath] --out [output_filepath]

Optional arguments

  • --snps_only - Keep only SNPs
  • --indels_only - Keep only indels
  • --no_hets - Remove heterozygote calls
  • --min_GQ [int] - Minimum genotype quality at both sites (default 30)

filter

Command line multiprocessing script for identification of bam sequences with phase changes

readcomb-filter --bam [bam_filepath] --vcf [vcf_filepath]

Optional arguments:

  • -p, --processes [processes], Number of processes available for filter (default 4)
  • -m, --mode [phase_change|no_match], Filtering mode (default phase_change)
  • -l, --log [log_filepath], Filename for log metric output
  • -o, --out [output_filepath], File to write filtered output to (default recomb_diagnosis)

classification

Python module for detailed classification of sequences containing phase changes

>>> import readcomb.classification as rc
>>> from cyvcf2 import VCF

>>> bam_filepath = 'data/example_sequences.bam'
>>> vcf_filepath = 'data/example_variants.vcf.gz'
>>> pairs = rc.pairs_creation(bam_filepath, vcf_filepath)     # generate list of Pair objects
>>> cyvcf_object = VCF(vcf_filepath)                          # cyvcf2 file object

>>> print(pairs[0])
Record name: chromosome_1-199370 
Read1: chromosome_1:499417-499667 
Read2: chromosome_1:499766-500016 
VCF: data/example_variants.vcf.gz

>>> pairs[0].classify(cyvcf_object)                           # run classification algorithm
>>> print(pairs[0])
Record name: chromosome_1-199370 
Read1: chromosome_1:499417-499667 
Read2: chromosome_1:499766-500016 
VCF: data/example_variants.vcf.gz
Unmatched Variant(s): False 
Condensed: [['CC2936', 499417, 499626], ['CC2935', 499626, 499736], ['CC2936', 499736, 500016]] 
Call: gene_conversion 
Condensed Masked: [['CC2936', 499487, 499626], ['CC2935', 499626, 499736], ['CC2936', 499736, 499946]] 
Call Masked: gene_conversion 

License

GNU General Public License v3 (GPLv3+)

Development

Currently in alpha

Source code

Development repo

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

readcomb-0.3.10.tar.gz (39.2 kB view hashes)

Uploaded source

Built Distribution

readcomb-0.3.10-py3-none-any.whl (52.7 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page