Skip to main content

Convert RNA-STAR SJ.out.tab files to 5-prime and 3-prime "percent spliced in" ("psi") scores.

Project description

Annotation-free estimation of percent spliced in of a junction. This will convert [RNA-STAR aligner](http://bioinformatics.oxfordjournals .org/content/29/1/15.long) “SJ.out.tab” files to “Percent spliced-in” (Psi) scores.

As described in [Pervouchine et al, Bioinformatics (2013)](http://bioinformatics.oxfordjournals.org/content/29/2/273.long), we will take the approach of asking, how often is this donor site (5’ splice site) used with this acceptor site (3’ splice site), compared to ALL OTHER acceptors?

Same goes for acceptor sites. How often is this acceptor site, used with this donor site, compared to ALL OTHER donors?

To illustrate, check out this example. Each “-” represents 10 bp

Splice junction fig genome location number of reads [ ]——–[ ] chr1:100-180 90 [ ]———-[ ] chr1:100-200 10 [ ]——-[ ] chr1:130-200 40

For the 5’ splice site chr1:100, we have 90+10 = 100 total reads. Thus the “psi5” for chr1:100-180 is 90/100 = 0.9, and 0.1 for chr:100-200.

For the 3’ splice site chr1:200, we have 10+40 = 50 total reads. Thus the “psi3” for chr1:100-200 is 10/50 = 0.2, and 0.8 for chr:130-200.

What’s left is the uninteresting splice sites of chr1:180 and chr1:130, both of which didn’t have any variance and were always used. Thus psi3 for chr1:180 is 1.0, and psi5 for chr1:130 is 1.0 as well.

>>> import pandas as pd
>>> data = {'chrom': ['chr1', 'chr1', 'chr1'],
... 'first_bp_intron':[100, 100, 130], 'last_bp_intron':[100, 200, 200],
... 'unique_junction_reads':[90, 10, 40],
... 'multimap_junction_reads':[0, 0, 0]}
>>> sj = pd.DataFrame(data)
>>> get_psis(sj)
  chrom  first_bp_intron  last_bp_intron  multimap_junction_reads  \
0  chr1              100             100                        0
1  chr1              100             200                        0
2  chr1              130             200                        0
<BLANKLINE>
   unique_junction_reads  multimap_junction_reads_filtered  \
0                     90                                 0
1                     10                                 0
2                     40                                 0
<BLANKLINE>
   unique_junction_reads_filtered  total_filtered_reads  psi5  psi3
0                              90                    90   0.9   1.0
1                              10                    10   0.1   0.2
2                              40                    40   1.0   0.8
<BLANKLINE>
[3 rows x 10 columns]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sj2psi-0.0.1.tar.gz (3.3 kB view details)

Uploaded Source

File details

Details for the file sj2psi-0.0.1.tar.gz.

File metadata

  • Download URL: sj2psi-0.0.1.tar.gz
  • Upload date:
  • Size: 3.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for sj2psi-0.0.1.tar.gz
Algorithm Hash digest
SHA256 29a2a2e87ffbe9a0cf2c8fa326b984a4b31cabf728f42d78915a046e5c83e82a
MD5 c906806dac2a6b77af96ee86a81e6dc0
BLAKE2b-256 d2e7d4d1b2d8d734e3c1153dbcb4fbab4a451d95400dbd9c71108806dd3550a4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page