Assemble transcript sequences fragments near variants
Project description
[![DOI](https://zenodo.org/badge/18834/hammerlab/isovar.svg)](https://zenodo.org/badge/latestdoi/18834/hammerlab/isovar) [![Build Status](https://travis-ci.org/hammerlab/isovar.svg?branch=master)](https://travis-ci.org/hammerlab/isovar) [![Coverage Status](https://coveralls.io/repos/github/hammerlab/isovar/badge.svg?branch=master)](https://coveralls.io/github/hammerlab/isovar?branch=master)
# isovar
Abundance quantification of distinct transcript sequences containing somatic variants from cancer RNAseq
## Example
```sh
$ isovar-protein-sequences.py \
--vcf somatic-variants.vcf \
--bam rnaseq.bam \
--genome hg19 \
--min-reads 2 \
--protein-sequence-length 30 \
--output isovar-results.csv
chr pos ref alt amino_acids \
0 22 46931060 A C FGVEAVDHGWPSMSSGSSWRASRGPPPPPR
1 22 46931062 G A CFGVEAVDHGWPPMSLAHGGPAVVHRLHPEA
variant_aa_interval_start variant_aa_interval_end ends_with_stop_codon \
0 16 17 False
1 16 17 False
frameshift translations_count supporting_variant_reads_count \
0 False 1 1
1 False 1 1
total_variant_reads supporting_transcripts_count total_transcripts \
0 130 2 2
1 127 2 2
gene
0 CELSR1
1 CELSR1
```
## Algorithm/Design
The one line explanation of isovar: `ProteinSequence = VariantSequence + ReferenceContext`.
A little more detail about the algorithm:
1. Scan through an RNAseq BAM file and extract sequences overlapping a variant locus (represented by `ReadAtLocus`)
2. Make sure that the read contains the variant allele and split its sequence into prefix/alt/suffix string parts (represented by `VariantRead`)
3. Combine multiple `VariantRead` records into a `VariantSequence`
4. Gather possible reading frames for distinct reference sequences around the variant locus (represented by `ReferenceContext`).
5. Use the reading frame from a `ReferenceContext` to translate a `VariantSequence` into a protein fragment (represented by `Translation`).
6. Multiple distinct variant sequences and reference contexts can generate the same translations, so we aggregate those equivalent `Translation` objects into a `ProteinSequence`.
Since we may not want to deal with *every* possible translation of *every* distinct sequence detected around a variant, `isovar` sorts the variant sequences by the number of supporting reads and the reference contexts in order of protein length and a configurable number of
translated protein fragments can be kept from this ordering.
# isovar
Abundance quantification of distinct transcript sequences containing somatic variants from cancer RNAseq
## Example
```sh
$ isovar-protein-sequences.py \
--vcf somatic-variants.vcf \
--bam rnaseq.bam \
--genome hg19 \
--min-reads 2 \
--protein-sequence-length 30 \
--output isovar-results.csv
chr pos ref alt amino_acids \
0 22 46931060 A C FGVEAVDHGWPSMSSGSSWRASRGPPPPPR
1 22 46931062 G A CFGVEAVDHGWPPMSLAHGGPAVVHRLHPEA
variant_aa_interval_start variant_aa_interval_end ends_with_stop_codon \
0 16 17 False
1 16 17 False
frameshift translations_count supporting_variant_reads_count \
0 False 1 1
1 False 1 1
total_variant_reads supporting_transcripts_count total_transcripts \
0 130 2 2
1 127 2 2
gene
0 CELSR1
1 CELSR1
```
## Algorithm/Design
The one line explanation of isovar: `ProteinSequence = VariantSequence + ReferenceContext`.
A little more detail about the algorithm:
1. Scan through an RNAseq BAM file and extract sequences overlapping a variant locus (represented by `ReadAtLocus`)
2. Make sure that the read contains the variant allele and split its sequence into prefix/alt/suffix string parts (represented by `VariantRead`)
3. Combine multiple `VariantRead` records into a `VariantSequence`
4. Gather possible reading frames for distinct reference sequences around the variant locus (represented by `ReferenceContext`).
5. Use the reading frame from a `ReferenceContext` to translate a `VariantSequence` into a protein fragment (represented by `Translation`).
6. Multiple distinct variant sequences and reference contexts can generate the same translations, so we aggregate those equivalent `Translation` objects into a `ProteinSequence`.
Since we may not want to deal with *every* possible translation of *every* distinct sequence detected around a variant, `isovar` sorts the variant sequences by the number of supporting reads and the reference contexts in order of protein length and a configurable number of
translated protein fragments can be kept from this ordering.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
isovar-0.2.3.tar.gz
(44.6 kB
view details)
File details
Details for the file isovar-0.2.3.tar.gz
.
File metadata
- Download URL: isovar-0.2.3.tar.gz
- Upload date:
- Size: 44.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e65c1f1ccda0022b405216e0723559ed999b7c24cbc4362a9f36f8e0da69932f |
|
MD5 | e427c8a8dbd61490c205057c9e9b03fc |
|
BLAKE2b-256 | 72bf82e36d137b0866be27e0a27c1545fa4fb0b519e1aac31734b239fff5bff9 |