A library to manipulate data for our DMS prediction models.
Project description
Download your RNA data from HuggingFace with rouskinhf!
A wrapper around Huggingface the load data for eFold. You can:
- pull datasets from the Rouskinlab's HuggingFace
- create datasets from local files
Installation
To download data
pip install rouskinhf
To push data to huggingface (optional)
- get a token access from the rouskilab huggingface's page
- add this token to your environment
export HUGGINGFACE_TOKEN="hf_yourtokenhere"
To predict structures from rouskinhf (optional)
You'll need to install D. Mathew's RNAstructure Fold (also available on Rouskinlab GitHub).
Check your RNAstructure Fold installation in a terminal:
Fold --version
How to use
Download a dataset
import rouskinhf
rouskinhf.get_dataset(
name='bpRNA-1m', # the name of a dataset from huggingface/rouskinlab
force_download = False # use a local copy of the data if it exists
)
Convert whatever format to rouskinhf format
import rouskinhf
rouskinhf.convert(
format = 'ct', # can be ct, seismic, bpseq, fasta or json (rouskinhf output data structure)
file_or_folder = 'path/to/my/ct/folder',
predict_structure = False, # Add structure from RNAstructure
filter = True, # removes duplicates, non-regular characters and low AUROC
min_AUROC=0.8,
)
Note: Sequences with bases different than
A
,C
,G
,T
,U
,N
,a
,c
,g
,t
,u
,n
are not supported. The data will be filtered out.
Rouskinhf structure format
# rouskinhf_output_file.json
{
"reference_name": {
"sequence": "CACGCUAUG",
"structure": [(0,8), (1,7)], # base pair representation
# whatever other info you need
}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
rouskinhf-0.4.8.tar.gz
(17.3 kB
view details)
Built Distribution
rouskinhf-0.4.8-py3-none-any.whl
(19.3 kB
view details)
File details
Details for the file rouskinhf-0.4.8.tar.gz
.
File metadata
- Download URL: rouskinhf-0.4.8.tar.gz
- Upload date:
- Size: 17.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
61eaf0316e28fa4e8ba82f94d6bbd14a63e174b604f5980f6a27bdab2fd5e436
|
|
MD5 |
ff1f4dc2611970a8f1d11ca069ed0677
|
|
BLAKE2b-256 |
ed44cadd00eb40814ee959875deda43a36c11cbe91f5b76e4e83bd2fa6733d4d
|
File details
Details for the file rouskinhf-0.4.8-py3-none-any.whl
.
File metadata
- Download URL: rouskinhf-0.4.8-py3-none-any.whl
- Upload date:
- Size: 19.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
04576cb878ca6c049c22258083381d34a46eb7a583aeaefe0454cd3460a4c3ad
|
|
MD5 |
92f2a5e63b8872cd511db0fd9d344661
|
|
BLAKE2b-256 |
2e7f68789fba2f5fea1ef590f004f48be92ec856758b73241c92735eaaf2ddb9
|