Skip to main content

A library to manipulate data for our DMS prediction models.

Project description

PyPI GitHub tag (with filter)

Download your RNA data from HuggingFace with rouskinhf!

A wrapper around Huggingface the load data for eFold. You can:

  • pull datasets from the Rouskinlab's HuggingFace
  • create datasets from local files

Installation

To download data

pip install rouskinhf

To push data to huggingface (optional)

  • get a token access from the rouskilab huggingface's page
  • add this token to your environment
export HUGGINGFACE_TOKEN="hf_yourtokenhere"

To predict structures from rouskinhf (optional)

You'll need to install D. Mathew's RNAstructure Fold (also available on Rouskinlab GitHub).

Check your RNAstructure Fold installation in a terminal:

Fold --version

How to use

Download a dataset

import rouskinhf

rouskinhf.get_dataset(
    name='bpRNA-1m', # the name of a dataset from huggingface/rouskinlab
    force_download = False # use a local copy of the data if it exists
)

Convert whatever format to rouskinhf format

import rouskinhf

rouskinhf.convert(
    format = 'ct', # can be ct, seismic, bpseq, fasta or json (rouskinhf output data structure)
    file_or_folder = 'path/to/my/ct/folder',
    predict_structure = False, # Add structure from RNAstructure
    filter = True, # removes duplicates, non-regular characters and low AUROC
    min_AUROC=0.8,
)

Note: Sequences with bases different than A, C, G, T, U, N, a, c, g, t, u, n are not supported. The data will be filtered out.

Rouskinhf structure format

# rouskinhf_output_file.json
{
    "reference_name": {
        "sequence": "CACGCUAUG",
        "structure": [(0,8), (1,7)], # base pair representation
        # whatever other info you need
    }
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rouskinhf-0.4.8.tar.gz (17.3 kB view details)

Uploaded Source

Built Distribution

rouskinhf-0.4.8-py3-none-any.whl (19.3 kB view details)

Uploaded Python 3

File details

Details for the file rouskinhf-0.4.8.tar.gz.

File metadata

  • Download URL: rouskinhf-0.4.8.tar.gz
  • Upload date:
  • Size: 17.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for rouskinhf-0.4.8.tar.gz
Algorithm Hash digest
SHA256 61eaf0316e28fa4e8ba82f94d6bbd14a63e174b604f5980f6a27bdab2fd5e436
MD5 ff1f4dc2611970a8f1d11ca069ed0677
BLAKE2b-256 ed44cadd00eb40814ee959875deda43a36c11cbe91f5b76e4e83bd2fa6733d4d

See more details on using hashes here.

File details

Details for the file rouskinhf-0.4.8-py3-none-any.whl.

File metadata

  • Download URL: rouskinhf-0.4.8-py3-none-any.whl
  • Upload date:
  • Size: 19.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for rouskinhf-0.4.8-py3-none-any.whl
Algorithm Hash digest
SHA256 04576cb878ca6c049c22258083381d34a46eb7a583aeaefe0454cd3460a4c3ad
MD5 92f2a5e63b8872cd511db0fd9d344661
BLAKE2b-256 2e7f68789fba2f5fea1ef590f004f48be92ec856758b73241c92735eaaf2ddb9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page