Skip to main content

Collect and generate RNA-RBP interaction data in various formats

Project description

rnpfind

Command line tool for collecting and generating RNA-RBP interaction data in various formats.

Installation

pip install rnpfind

Requirements:

  • Python >=3.8

Usage

rnpfind can be used as a command line tool as follows:

rnpfind <transcript>

where transcript is a gene name such as "PTEN", or an hg38 coordinate range (such as 11:65497688-65506516)

Note: If a transcript is specified through genomic coordinates, the transcript is assumed to be on the forward strand (the "+" strand). On the other hand, a transcript specified by name will have its strand positioning correctly identified.

The tool takes a transcript as input, and computes the binding sites of various RBPs on the transcript. The information is collected from data sources including RBPDB, ATTRACT, and POSTAR.

The tool produces as output the binding data in a folder (use --out-dir to specify). A few output formats are supported (use --out-format to specify):

  • bed format: a widely used format for displaying intervals. A bed file created in this way could be visualized on a genome browser, for example. Note the --trackhub option avaiable to generate a trackhub structure (useful for hosting a large number of indexed bed files (bigBed files) and allowing users to view on genome browsers like the UCSC Genome Browser.

  • csv format: an NxN table (where N=number of RBPs) showing binding correlations of RBPs on the particular transcript analyzed. This could be useful for inferring molecular mechanisms on certain regions of the transcriptome.

For more options, run rnpfind --help

Within Python

You can import rnpfind to your Python code as follows:

from rnpfind import rnpfind

# Collect data on Malat1
rnpfind("malat1")

The data is written to disk like in the command line call. Check help(rnpfind) for keyword arg options.

Perhaps not so usefully, you can find the genome version rnpfind is working with programatically:

from rnpfind import GENOME_VERSION
print(GENOME_VERSION)

How does it work?

In principle, RNA-RBP interactions can be backed by two forms of evidence: experimental and computational.

The experimental binding sites are collected on large databases such as POSTAR. The computational binding sites are generated by scanning RNA-binding-motifs of various RBPs (collected from RBPDB and ATTRACT) across a transcript to look for hits.

As a result, the tool requires around 6.4GB to function. The data is downloaded automatically on the first run of the tool, or can be downloaded manually using rnpfind-download.

If the above memory footprint is too much for you to handle, consider using the web tool avaiable at https://rnpfind.com

Contributing

Any suggestions / PR requests are welcome!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rnpfind-1.2.6.tar.gz (8.9 MB view hashes)

Uploaded Source

Built Distribution

rnpfind-1.2.6-py3-none-any.whl (8.9 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page