Katana

Command-line tool to soft-clip reads based on primer locations.

These details have not been verified by PyPI

Project links

Homepage

Project description

======
Katana
======

Command-line tool to soft-clip reads from amplicon-based sequence based on
specified primer locations.

.. image:: https://travis-ci.org/umich-brcf-bioinf/Katana.svg?branch=develop
:target: https://travis-ci.org/umich-brcf-bioinf/Katana
:alt: Build Status

.. image:: https://coveralls.io/repos/github/umich-brcf-bioinf/Katana/badge.svg?branch=develop
:target: https://coveralls.io/github/umich-brcf-bioinf/Katana?branch=develop
:alt: Coverage Status

The official repository is at:

https://github.com/umich-brcf-bioinf/Katana

--------
Overview
--------

In amplicon-based target panel sequencing, regions-of-interest are amplified by
specific pairs of primers; consequently the regions-of-interest typically
always start and end with these primer sequences, sequences which match the
reference sequence exactly and do not reflect the actual sample sequence. In
some panel designs, the amplicons may be tiled such that an amplicon of one
region of interest may overlap the primer region of different amplicon. In this
arrangement, the overlapping regions should enable detection of variants that
fall within that primer region. However, the presence of the primer sequences
will typically overwhelm the signature of true, low-frequency variants.

Katana matches each read to its corresponding primer pair based on start
position of the read. Katana then soft-clips the primer region from the edge of
the read sequence, rescuing the signal of true variants measured by overlapping
amplicons. The output is conceptually similar to hard-clipping the primers from
the original FASTQ reads based on sequence identity but with the advantage that
retaining the primers during alignment improves alignment quality.
::
amplicon A [ primerREGION-OF-INTERESTprimer ]
amplicon B [ primerREGION-OF-INTERESTprimer ]
input read1 sequence: TGCATGAGTCTGATCTAGGTAGTTGACGTC
input read2 sequence: ATCTAGGTAGTTGACGTCAGATAATGCAGC

output read1 sequence: tgcatgAGTCTGATCTAGGTAGTTgacgtc (clipped amplicon A primers)
output read2 sequence: atctagGTAGTTGACGTCAGATAAtgcagc (clipped amplicon B primers)
(lowercase = soft-clipped)

Tags are added to each output read to help explain how it was modified:
- X0 : associated primer id
- X1 : original cigar string
- X2 : original reference start
- X3 : original reference_end (informational; useful for reverse reads)
- X4 : why read would be excluded (appears only if --preserve_all_alignments)

Katana assumes that:
- input bam is indexed
- primers come in sense-antisense pairs
- primer pairs are on the same chromosome
- primer chromsomes match the bam regions
- primer file is tab separated; the header line includes the following fields:
* Customer TargetID
* Chr
* Sense Start
* Antisense Start
* Sense Sequence
* Antisense Sequence
- primer file sense and antisense start are specified in 1-based coordinates

-----------
Quick Start
-----------

1. **Install Katana (see INSTALL.rst):**
::
$ pip install katana

2. **Get the examples directory:**
::
$ git clone https://github.com/umich-brcf-bioinf/Katana

3. **Run Katana:**
::
$ katana Katana/examples/primers.txt Katana/examples/chr10.pten.bam clipped.bam

This will read chr10.pten.bam and produce clipped.bam which contains reads
adjusted to soft-clip (exclude) their respective primer regions. Unmapped reads
or reads which do not match a known primer are excluded.

-----------
Katana help
-----------

::

$ katana --help

usage: katana primer_manifest input_bam output_bam

Match each alignment in input BAM to primer, softclipping the primer region.

positional arguments:
primer_manifest path to primer manifest (tab-separated text)
input_bam path to input BAM
output_bam path to output BAM

optional arguments:
-h, --help show this help message and exit
-V, --version show program's version number and exit
--preserve_all_alignments
Preserve all incoming alignments (even if they are
unmapped, cannot be matched with primers, result in
invalid CIGARs, etc.)

====

Email bfx-katana@umich.edu for support and questions.

UM BRCF Bioinformatics Core

Changelog
=========

0.1.2 (11/2/2017)
-----------------
- Adds/correctly updates MC tag
- Fixed erroneous mate info when mate is filtered out

- Correctly sets mate start pos to 0
- Removes MC tag if present

- Sanitizes BAM tag of primer names
- Extended supported pysam versions to include 0.9-0.12

0.1.1 (2/9/2016)
----------------
- Fixed problems in BAM output:
- Corrected next reference in paired reads
- Excludes reads where CIGAR is entirely clipped
- Unpairs reads which had no mate in input
- Added BAM tags to excluded reads (useful when --preserving_all_reads)
- Adjusted to improve performance (about 6x faster)
- Added support for pip install
- Added functional tests
- Added support for travis CI
- Added support for Python3
- Added support for pysam 0.8.3

0.1 (1/28/2016)
---------------
- Initial Release

Katana is written and maintained by the University of Michigan
BRCF Bioinformatic Core; individual contributors include:

- Chris Gates
- Peter Ulintz

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.2

Nov 2, 2017

0.1.1

Feb 10, 2016

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

Katana-0.1.2-py2.py3-none-any.whl (18.1 kB view details)

Uploaded Nov 2, 2017 Python 2Python 3

File details

Details for the file Katana-0.1.2-py2.py3-none-any.whl.

File metadata

Download URL: Katana-0.1.2-py2.py3-none-any.whl
Upload date: Nov 2, 2017
Size: 18.1 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No

File hashes

Hashes for Katana-0.1.2-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`eb71679c4a937a8be35f71332bb2634e41dcd22077cc3201557d9a6b49ba1515`
MD5	`4bbcce624231e862d4f02a3da4c3aa59`
BLAKE2b-256	`b61ef28db023c35b15cd7b18ad6df90ae06994c333acf53800497324683b479b`

See more details on using hashes here.

Katana 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes