Connor

Command-line tool to deduplicate reads in bam files based on custom inline barcoding.

These details have not been verified by PyPI

Project links

Homepage

Project description

A command-line tool to deduplicate bam files based on custom, inline barcoding.

The official repository is at: https://github.com/umich-brcf-bioinf/Connor

Overview

When analyzing deep-sequence NGS data it is sometimes difficult to distinguish sequencing and PCR errors from rare variants; as a result some variants may be missed and some will be identified with an inaccurate variant frequency. To address this, researchers can attach random barcode sequences during sample preparation. Upon sequencing, the barcodes act as a signature to trace the set of PCR amplified molecules back to the original biological molecules of interest thereby differentiating rare variants in the original molecule from errors introduced downstream.

Connor accepts a barcoded, paired alignment file (BAM), groups those input alignments into families, combines each family into a consensus alignment, and emits the set of deduplicated, consensus alignments (BAM).

Connor workflow:

Sequencing [FASTQ 1/2] -> Aligner [BAM] -> Connor [BAM] -> Variant Detection [VCF]

Connor first groups original alignments into alignment families based on their alignment position and Universal Molecular Tag (UMT) barcode. (Connor assumes the incoming aligned sequences begin with the UMT barcode.) Each family of alignments is then combined into a single consensus alignment; discrepancies in base-calls and qualities are resolved by majority vote across family members. By default, smaller families (<3 align pairs) are excluded.

For more information see:

QUICKSTART : get started deduplicating barcoded BAMs.
INSTALL : alternative ways to install.
METHODS : details on UMT barcode structure, suggestions on alignment parameters, details on family grouping, and examples.

Connor help

$ connor --help
 usage: connor input_bam output_bam

 positional arguments:
   input_bam             path to input BAM
   output_bam            path to deduplicated output BAM

 optional arguments:
   -h, --help            show this help message and exit
   -V, --version         show program's version number and exit
   -v, --verbose         print all log messages to console
   --log_file LOG_FILE   ={output_filename}.log. Path to verbose log file
   --annotated_output_bam ANNOTATED_OUTPUT_BAM
                         path to output BAM containing all original aligns annotated with BAM tags
   -f CONSENSUS_FREQ_THRESHOLD, --consensus_freq_threshold CONSENSUS_FREQ_THRESHOLD
                         =0.6 (0..1.0): Ambiguous base calls at a specific position in a family are
                          transformed to either majority base call, or N if the majority percentage
                          is below this threshold. (Higher threshold results in more Ns in
                          consensus.)
   -s MIN_FAMILY_SIZE_THRESHOLD, --min_family_size_threshold MIN_FAMILY_SIZE_THRESHOLD
                         =3 (>=0): families with count of original reads < threshold are excluded
                          from the deduplicated output. (Higher threshold is more
                          stringent.)
   -d UMT_DISTANCE_THRESHOLD, --umt_distance_threshold UMT_DISTANCE_THRESHOLD
                         =1 (>=0); UMTs equal to or closer than this Hamming distance will be
                          combined into a single family. Lower threshold make more families with more
                          consistent UMTs; 0 implies UMT must match
                          exactly.
   --filter_order {count,name}
                        =count; determines how filters will be ordered in the log
                        results
   --umt_length UMT_LENGTH
                        =6 (>=1); length of UMT

Email bfx-connor@umich.edu for support and questions.

UM BRCF Bioinformatics Core

Changelog

0.6 (4/11/2018)

Extended to support pysam v0.13, v0.14
Added optional command line arg to specify length of unique molecular tag (UMT)
Added optional command line arg to sort filters results by name instead of count
Added validation to check for properly paired alignments
Added validation to check for presence of secondary alignments
Adjusted so warning instead of error when no families found
Substantial refactors to clarify implementation

0.5.1 (9/8/2017)

Extended supported python and pysam versions
Adjusted to avoid performance problem when processing extremely deep pileups
Adjusted so that when no families pass filters show warning instead of error message (thanks to ccario83 for upvoting this fix)

0.5 (9/13/2016)

Filters now exclude supplemental alignments
Added BAM tags to show pair positions and CIGAR values
Reduced required memory and improved performance

0.4 (8/26/2016)

Added input/command validations
Added annotated bam option
Revised QUICKSTART, METHODS
Added PG line in BAM header
Improved logging of filtered aligns and progress
Removed some logged stats to focus logging results
Removed dependency on pandas/numpy
Moderate performance (speed) improvements in calculating consensus sequence
Switched consensus quality to be the max mapping quality

0.3 (8/8/2016)

Added filters to exclude low quality, unmapped, or unpaired alignments
Revised BAM tags; documented BAM tags in BAM header
Extended logging to write to file and console
Adjusted to make deterministic in Py3/Py2

0.2 (7/15/2016)

Bugfix: connor was mangling left hand side of right hand consensus reads
Fuzzy grouping of pairs into families based on left or right UMI match
Fuzzy grouping of pairs into families based on UMI within Hamming distance
Command line args for hamming distince, consensus threshold, min orig reads
Extended logging to assist in overall diagnostics
Generate additional file of alignments excluded from consensus (diagnostic)
Added UMI sequence tag (X0)

0.1 (6/17/2016)

Initial development release
Partitions raw reads into consensus families

Connor is written and maintained by the University of Michigan BRCF Bioinformatic Core; individual contributors include:

Chris Gates
Peter Ulintz

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.6.1

Aug 17, 2018

This version

0.6

Apr 11, 2018

0.5.1

Sep 8, 2017

0.5

Sep 14, 2016

0.4

Aug 27, 2016

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

Connor-0.6-py2.py3-none-any.whl (26.1 kB view details)

Uploaded Apr 11, 2018 Python 2Python 3

File details

Details for the file Connor-0.6-py2.py3-none-any.whl.

File metadata

Download URL: Connor-0.6-py2.py3-none-any.whl
Upload date: Apr 11, 2018
Size: 26.1 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No

File hashes

Hashes for Connor-0.6-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`8e2b1814eef255212ab1c9dcbeec20c7616135b2822f926ed458925545957672`
MD5	`910af43a1f07130e3830497ed87133b3`
BLAKE2b-256	`a84af9e85b72ad090ba480e55512f2d8abd1b742371ecaa87b0fc7b4e14951c0`

See more details on using hashes here.

Connor 0.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Overview

Connor help

Changelog

0.6 (4/11/2018)

0.5.1 (9/8/2017)

0.5 (9/13/2016)

0.4 (8/26/2016)

0.3 (8/8/2016)

0.2 (7/15/2016)

0.1 (6/17/2016)

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes