Skip to main content

一个不错的RNA包

Project description

RNAPy — Unified RNA Analysis Toolkit

RNAPy is a unified Python toolkit that wraps several powerful RNA models with a consistent, easy-to-use API. It currently integrates:

  • RNA-FM / mRNA-FM for sequence embeddings and 2D secondary structure prediction
  • RhoFold for 3D structure prediction
  • RiboDiffusion for inverse folding (sequence generation from structure)
  • RhoDesign for inverse folding (structure-to-sequence, optional 2D guidance)
  • RNA-MSM for MSA-based embeddings, attention, consensus, and conservation

Key Features

  • Consistent high-level API via RNAToolkit
  • 2D structure prediction (RNA-FM / mRNA-FM)
  • 3D structure prediction (RhoFold)
  • Inverse folding (RiboDiffusion, RhoDesign)
  • MSA analysis and features (RNA-MSM: embeddings, attention, consensus, conservation)

Project Structure

RNAPy
├── rnapy/                    # Library source
│   ├── core/                 # Base classes, factory, config, exceptions
│   ├── providers/            # Model providers (rna_fm/mrna_fm, rhofold, RiboDiffusion, rhodesign, rna_msm)
│   ├── interfaces/           # Public interfaces
│   └── utils/                # Utilities
├── configs/                  # Global and model configs (YAML)
├── demos/                    # Ready-to-run examples
│   ├── models/               # Put pretrained weights here
│   ├── results/              # Default output location for demos
│   └── demo_*.py             # Demo scripts
├── requirements.txt
├── setup.py
└── README.md

Installation

Recommended: Python 3.10+ and a recent PyTorch build compatible with your CPU/GPU.

Windows PowerShell example:

# 1) Create and activate a virtual environment (optional but recommended)
python -m venv .venv
.\.venv\Scripts\Activate.ps1

# 2) Install RNAPy in editable mode
pip install -e .

# 3) Install runtime dependencies (if not pulled via setup)
pip install -r requirements.txt

Note: Ensure your installed torch matches your CUDA setup if you plan to use GPU.

Documentation

  • Toolkit usage guide (English): docs/RNAToolkit_Usage_Guide.md

Model Weights

Place checkpoints under ./models/ (paths used in the demos):

  • RhoFold: ./models/RhoFold_pretrained.pt
  • RiboDiffusion: ./models/exp_inf.pth
  • mRNA-FM (or RNA-FM alternative): ./models/mRNA-FM_pretrained.pth (or ./models/RNA-FM_pretrained.pth)
  • RhoDesign: ./models/ss_apexp_best.pth (with-2D variant; accepts optional secondary structure file)
  • RNA-MSM: ./models/RNA_MSM_pretrained_weights.pt (or RNA_MSM_pretrained.ckpt)

You can customize locations through your own code or configs.

Quick Start

1) mRNA-FM (2D structure + embeddings)

from rnapy import RNAToolkit

sequence = "AGAUAGUCGUGGGUUCCCUUUCUGGAGGGAGAGGGAAUUCCACGUUGACCGGGGGAACCGGCCAGGCCCGGAAGGGAGCAACCGUGCCCGGCUAUC"

# Initialize
toolkit = RNAToolkit(device="cpu")

# Load model (choose one)
model_path = "./models/mRNA-FM_pretrained.pth"  # or RNA-FM_pretrained.pth
toolkit.load_model("mrna-fm", model_path)
# toolkit.load_model("rna-fm", "./models/RNA-FM_pretrained.pth")

# 2D structure prediction
result = toolkit.predict_structure(
    sequence,
    structure_type="2d",
    model="mrna-fm",
    save_dir="./results/rna_fm/demo.ct",
)

# Embeddings
embeddings = toolkit.extract_embeddings(
    sequence,
    model="mrna-fm",
    save_dir="./results/rna_fm/embeddings.npy",
)

print(result.get("secondary_structure"))
print(result.get("confidence_scores"))

2) RhoFold (3D structure prediction)

from rnapy import RNAToolkit

sequence = "GGAUCCCGCGCCCCUUUCUCCCCGGUGAUCCCGCGAGCCCCGGUAAGGCCGGGUCC"

toolkit = RNAToolkit(device="cpu")

# Load RhoFold
toolkit.load_model("rhofold", "./models/RhoFold_pretrained.pt")

# Predict 3D
result = toolkit.predict_structure(
    sequence,
    structure_type="3d",
    model="rhofold",
    save_dir="./results/rhofold",
    relax_steps=500,
)

pdb_file = result.get("structure_3d_refined", result.get("structure_3d_unrelaxed"))
print("3D structure:", pdb_file)

3) RiboDiffusion (inverse folding from PDB)

from rnapy import RNAToolkit

structure_file = "./input/R1107.pdb"

toolkit = RNAToolkit(device="cpu")

# Load RiboDiffusion
toolkit.load_model("ribodiffusion", "./models/exp_inf.pth")

# Generate sequences from structure
result = toolkit.generate_sequences_from_structure(
    structure_file=structure_file,
    model="ribodiffusion",
    n_samples=2,
    sampling_steps=100,
    cond_scale=0.5,
    dynamic_threshold=True,
    save_dir="./results/ribodiffusion",
)

print("Generated count:", result.get("sequence_count", 0))
print("Output dir:", result.get("output_directory"))

4) RhoDesign (inverse folding with optional 2D guidance)

from rnapy import RNAToolkit

pdb_path = "./input/2zh6_B.pdb"
ss_path = "./input/2zh6_B.npy"  # optional numpy file with secondary-structure/contact info

toolkit = RNAToolkit(device="cpu")

# Load RhoDesign (with-2D variant checkpoint)
toolkit.load_model("rhodesign", "./models/ss_apexp_best.pth")

# Generate one sequence from structure (RhoDesign samples one sequence per call)
res = toolkit.generate_sequences_from_structure(
    structure_file=pdb_path,
    model="rhodesign",
    secondary_structure_file=ss_path,  # omit or set None to run without 2D guidance
    save_dir="./results/rhodesign"
)

print("Predicted sequence:", res["sequences"][0])
print("Recovery rate:", res.get("quality_metrics", {}).get("sequence_recovery_rate"))
print("FASTA:", res.get("files", {}).get("fasta_files", [None])[0])

5) RNA-MSM (MSA features, consensus, conservation)

from rnapy import RNAToolkit

# Initialize
toolkit = RNAToolkit(device="cpu")

# Load RNA-MSM
toolkit.load_model("rna-msm", "./models/RNA_MSM_pretrained_weights.pt")

# Prepare an example MSA (aligned sequences)
msa_sequences = [
    "AUGGCGAUUUUAUUUACCGCAGUCGUUACCAACAUACUCGACUUUAAAUGCC",
    "AUGGCAAUUUUAUUUACCGCAGUCGUUACCAACAUACUCGACUUUAAAUGCC",
    "AUGGCGAUUUCAUUUACCGCAGUCGUUACCAACAUACUCGACUUUAAAUGCC",
    "AUGGCGAUUUUAUUUACCGCAGUCGUUACCAGCAUACUCGACUUUAAAUGCC",
]

# Extract embeddings (per-position, last layer by default)
features = toolkit.extract_msa_features(
    msa_sequences,
    feature_type="embeddings",
    model="rna-msm",
    save_dir="./results/rna_msm",
)

# Analyze MSA for consensus and conservation
msa_result = toolkit.analyze_msa(
    msa_sequences,
    model="rna-msm",
    extract_consensus=True,
    extract_conservation=True,
    save_dir="./results/rna_msm",
)

print("Consensus:", msa_result.get("consensus_sequence"))
print("Conservation (first 10):", (msa_result.get("conservation_scores") or [])[:10])

Run the Demos

From the repository root:

# mRNA-FM / RNA-FM demo
cd .\demos
python .\demo_rna_fm.py

# RhoFold demo
python .\demo_rhofold.py

# RiboDiffusion demo
python .\demo_ribodiffusion.py

# RhoDesign demo
python .\demo_rhodesign.py

# RNA-MSM demo
python .\demo_rna_msm.py

Additional examples may be available: rna_fm_demo.py, rhofold_demo.py, ribodiffusion_demo.py.

Configuration

YAML configs are provided under ./configs/ and ./demos/configs/. You can:

  • Pass config_dir to RNAToolkit to use custom defaults
  • Override per-call parameters in load_model(...) and task methods

Example (global excerpt):

global:
  device: "cpu"
  precision: "float32"
  cache_dir: "./cache"

Model-specific YAMLs (e.g., rna_fm.yaml, rhofold.yaml, ribodiffusion.yaml) control provider defaults. For models without a dedicated YAML, pass options via load_model(..., **kwargs) or call-time kwargs.

License

MIT License

Acknowledgements

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rnapy-2.0.tar.gz (226.6 kB view details)

Uploaded Source

File details

Details for the file rnapy-2.0.tar.gz.

File metadata

  • Download URL: rnapy-2.0.tar.gz
  • Upload date:
  • Size: 226.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.5

File hashes

Hashes for rnapy-2.0.tar.gz
Algorithm Hash digest
SHA256 dc75465a05955f8afd00293cf60b38a312effeebd96a769761ecc710370438ab
MD5 54470d1c6124c0de4bd3345dcf8ae927
BLAKE2b-256 db0198707c6660a24053bd9541d429083407fa47e7ffd2ecdc3b7d9d3193a6a7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page