Assemble transcript sequences fragments near variants
Project description
isovar
Abundance quantification of distinct transcript sequences containing somatic variants from cancer RNAseq
Example
$ isovar-protein-sequences.py \
--vcf somatic-variants.vcf \
--bam rnaseq.bam \
--genome hg19 \
--min-reads 2 \
--protein-sequence-length 30 \
--output isovar-results.csv
chr pos ref alt amino_acids \
0 22 46931060 A C FGVEAVDHGWPSMSSGSSWRASRGPPPPPR
1 22 46931062 G A CFGVEAVDHGWPPMSLAHGGPAVVHRLHPEA
variant_aa_interval_start variant_aa_interval_end ends_with_stop_codon \
0 16 17 False
1 16 17 False
frameshift translations_count supporting_variant_reads_count \
0 False 1 1
1 False 1 1
total_variant_reads supporting_transcripts_count total_transcripts \
0 130 2 2
1 127 2 2
gene
0 CELSR1
1 CELSR1
Algorithm/Design
The one line explanation of isovar: ProteinSequence = VariantSequence + ReferenceContext.
A little more detail about the algorithm: 1. Scan through an RNAseq BAM file and extract sequences overlapping a variant locus (represented by ReadAtLocus) 2. Make sure that the read contains the variant allele and split its sequence into prefix/alt/suffix string parts (represented by VariantRead) 3. Combine multiple VariantRead records into a VariantSequence 4. Gather possible reading frames for distinct reference sequences around the variant locus (represented by ReferenceContext). 5. Use the reading frame from a ReferenceContext to translate a VariantSequence into a protein fragment (represented by Translation). 6. Multiple distinct variant sequences and reference contexts can generate the same translations, so we aggregate those equivalent Translation objects into a ProteinSequence.
Since we may not want to deal with every possible translation of every distinct sequence detected around a variant, isovar sorts the variant sequences by the number of supporting reads and the reference contexts in order of protein length and a configurable number of translated protein fragments can be kept from this ordering.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file isovar-0.1.0.tar.gz
.
File metadata
- Download URL: isovar-0.1.0.tar.gz
- Upload date:
- Size: 43.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7f27e6a1cfef46e7bbc082421ebabaf2d33c7a40bbfa662862333fdc231c76b8 |
|
MD5 | a50462ca3773209bb0715b2907e7b7b6 |
|
BLAKE2b-256 | acc8a208dbd44c8e1010147b662a62fc5cb4114024eebb5a47dd9da1beca8aa8 |