Skip to main content

A library to manipulate data for our DMS prediction models.

Project description

PyPI GitHub tag (with filter)

Download your RNA data from HuggingFace with rouskinhf!

A wrapper around Huggingface the load data for eFold. You can:

  • pull datasets from the Rouskinlab's HuggingFace
  • create datasets from local files

Installation

To download data

pip install rouskinhf

To push data to huggingface (optional)

  • get a token access from the rouskilab huggingface's page
  • add this token to your environment
export HUGGINGFACE_TOKEN="hf_yourtokenhere"

To predict structures from rouskinhf (optional)

You'll need to install D. Mathew's RNAstructure Fold (also available on Rouskinlab GitHub).

Check your RNAstructure Fold installation in a terminal:

Fold --version

How to use

Download a dataset

import rouskinhf

rouskinhf.get_dataset(
    name='bpRNA-1m', # the name of a dataset from huggingface/rouskinlab
    force_download = False # use a local copy of the data if it exists
)

Convert whatever format to rouskinhf format

import rouskinhf

rouskinhf.convert(
    format = 'ct', # can be ct, seismic, bpseq, fasta or json (rouskinhf output data structure)
    file_or_folder = 'path/to/my/ct/folder',
    predict_structure = False, # Add structure from RNAstructure
    filter = True, # removes duplicates, non-regular characters and low AUROC
    min_AUROC=0.8,
)

Note: Sequences with bases different than A, C, G, T, U, N, a, c, g, t, u, n are not supported. The data will be filtered out.

Rouskinhf structure format

# rouskinhf_output_file.json
{
    "reference_name": {
        "sequence": "CACGCUAUG",
        "structure": [(0,8), (1,7)], # base pair representation
        # whatever other info you need
    }
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rouskinhf-0.4.8.tar.gz (17.3 kB view hashes)

Uploaded Source

Built Distribution

rouskinhf-0.4.8-py3-none-any.whl (19.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page