Skip to main content

scanRBP: RNA-protein binding toolkit

Project description

scanRBP loads RNA-protein binding motif PWM and computes the log-odds scores for all the loaded RBPs across a given genomic sequence + draws a heatmap of the scores.

The scores can be described as follows (biopython docs):

Here we can see positive values for symbols more frequent in the motif than in the background and negative for symbols more frequent in the background. 0.0 means that it's equally likely to see a symbol in the background and in the motif.

Using the background distribution and PWM with pseudo-counts added, it's easy to compute the log-odds ratios, telling us what are the log odds of a particular symbol to be coming from a motif against the background.

For more information, see the biopython docs.

Example run

scanRBP quick start:

Usage for single sequence: scanRBP sequence output [options]
     * one sequence provided on the command line, generates output.png/pdf + output.tab

Usage for processing FASTA file: scanRBP filename.fasta [options]
     * one heatmap/matrix will be generated per sequence
     * output name of the files will be sequence ids provided in the fasta file

Options:
     -annotate               Annotate each heatmap cell with the number
     -xlabels                Display sequence (x-labels), default False
     -only_protein TARDBP    Only analyze binding for the specific protein / search by name
     -all_protein TARDBP     Additionally to one motif per protein (for all proteins), also include all motifs (PWMs) for this specific protein (search by name)
                             (note that one protein can have several PWMs)
     -figsize "(10,20)"      Change matplotlib/seaborn figure size for the heatmap, example width=10, height=20
     -heatmap title          Make heatmap (png+pdf) with title
     -output_folder folder   Store all results to the output folder (default: current folder)
     -nonzero                All negative vector values are set to 0, not enabled by default

Examples:

# taking a random sequence, will produce binding scores and a heatmap
# output: example1_PWM.tab # file with log-odds vectors for all proteins for the given command line sequence
# output: example1.png/pdf # heatmap image with clustering of protein binding vectors
./scanRBP AAAGCGGCGACTTATTATATCCCCATATATTATATCTTCTTCTCTTATATATAAACCAGAGATAGATGTGTGTGGTGG example1 -heatmap example1

# instead of taking one single sequence, the input can be a fasta file with multiple sequences
./scanRBP data.fasta

Motif PWM database

Using the mCross database of 112 RBPs from the paper:

Feng H, Bao S et al.
Modeling RNA-Binding Protein Specificity In Vivo by Precisely Registering Protein-RNA Crosslink Sites
Molecular Cell, 2019

To download the PWMs:

wget http://zhanglab.c2b2.columbia.edu/data/mCross/eCLIP_mCross_PWM.tgz --no-check-certificate
tar xfz eCLIP_mCross_PWM.tgz

Additional PWM dataset

https://genomebiology.biomedcentral.com/articles/10.1186/s13059-023-02913-0 https://static-content.springer.com/esm/art%3A10.1186%2Fs13059-023-02913-0/MediaObjects/13059_2023_2913_MOESM6_ESM.txt

CLIP dataset

bedGraph files list from:

https://www.encodeproject.org/metadata/?status=released&internal_tags=ENCORE&assay_title=eCLIP&biosample_ontology.term_name=K562&biosample_ontology.term_name=HepG2&type=Experiment&files.analyses.status=released&files.preferred_default=true

Any other bedGraph CLIP peak called file for a specific genome can be added to the database.

Gene data

Gene metadata (names, aliases) donwloaded from https://www.ncbi.nlm.nih.gov/gene/?term=human[organism]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scanRBP-0.1.3.tar.gz (8.6 kB view details)

Uploaded Source

Built Distribution

scanRBP-0.1.3-py3-none-any.whl (11.8 kB view details)

Uploaded Python 3

File details

Details for the file scanRBP-0.1.3.tar.gz.

File metadata

  • Download URL: scanRBP-0.1.3.tar.gz
  • Upload date:
  • Size: 8.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.8

File hashes

Hashes for scanRBP-0.1.3.tar.gz
Algorithm Hash digest
SHA256 0b5b33bf15fc73cc5e2c66fcaeafa98ca5a521c8ed49cc35ed7cf494bf3860fd
MD5 bf6950acfa3b08da1d80738ffa067aea
BLAKE2b-256 3d321f63dc2990c67063cc60ea63ad8beff56132f1a7c7e86babd3760425bcf7

See more details on using hashes here.

File details

Details for the file scanRBP-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: scanRBP-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 11.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.8

File hashes

Hashes for scanRBP-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 418b0731660bf1381da3676c1e04a132871c10a9e3fca27b11751c9f1c4f9766
MD5 a860431aea5f56b8df23976ce4cab5b5
BLAKE2b-256 680cfda2cff885dad34fdb023410af7507f3a29b8d0eaf5cdb5dc67ddbb32582

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page