Skip to main content

siRNAforge - Multi-species gene to siRNA design, off-target prediction, and ranking. Comprehensive siRNA design toolkit for gene silencing

Project description


🧬 What is siRNAforge?

siRNAforge is a production-ready tool for designing small interfering RNAs (siRNAs) with integrated multi-species off-target analysis. Built for researchers who need reliable, high-specificity gene silencing candidates.

Why siRNAforge?

  • 🎯 End-to-end workflow — From gene symbol to ranked candidates in one command
  • 🔬 Multi-species validation — Off-target analysis of transcriptome and miRNA seed matches across human, rat, and rhesus macaque genomes
  • 🐍 Developer-friendly — Modern Python API with full type hints and Pydantic models. Easily extend with your own scoring methods.

Key Features

Feature Description
🔍 Multi-database search Automatic transcript retrieval from Ensembl, RefSeq (TODO), and GENCODE (TODO)
🧬 Variant targeting Design and rank candidates against specific genetic variants with population AF filtering
🧾 Transcript annotations Fetch transcript models/interval annotations via a provider layer (Ensembl REST-backed)
🌡️ Thermodynamic scoring ViennaRNA-based secondary structure prediction and stability analysis
🎯 Transcriptome Off-target analysis Transcriptome BWA-MEM2 transcriptome search with mismatch tolerance control
🧬 miRNA seed avoidance MirGeneDB, MirBase (TODO) BWA-MEM2 mirna_seed search for known matches to miRNA seed regions
🔤 Smart species handling Accepts any format (common names, miRBase codes, scientific names) — auto-normalizes to canonical
⚙️ Nextflow pipeline Scalable, containerized execution for high-throughput analysis
💉 Chemical modifications Track 2'-O-methyl, 2'-fluoro, and phosphorothioate patterns
📊 Rich output Structured CSV, FASTA, and JSON reports with comprehensive metadata

Supported Python versions: 3.10, 3.11, 3.12 (Python 3.13+ pending ViennaRNA compatibility)


📦 Installation

Choose your path based on what you need to do:

Complete installation guide with troubleshooting →

  • Deploy / run from registry (no setup) — Pull the prebuilt image with all bio tools, Nextflow, and Java bundled.

    docker pull ghcr.io/austin-s-h/sirnaforge:latest
    
  • Daily development (Python-only, fast) — Use uv + managed virtualenv; great for core code and unit tests. Heavy bio/Nextflow tests stay skipped unless you also have Docker/Java.

    curl -LsSf https://astral.sh/uv/install.sh | sh
    git clone https://github.com/austin-s-h/sirnaforge && cd sirnaforge
    make dev
    make check
    
  • Complete local testing (matches CI) — Either

    1. Build and test in Docker (reuses the bundled tools): make docker-build-test
    2. Or use conda to get bio deps + Java locally, then run the full suite:
    conda env create -f environment-dev.yml
    conda activate sirnaforge
    make test-release
    

    (Nextflow/Java are required for Nextflow-marked tests; Docker is required for container-marked tests.)


🚀 Quick Start

Get your first results in 30 seconds:

# Docker
docker run -v $(pwd):/workspace -w /workspace \
  ghcr.io/austin-s-h/sirnaforge:latest \
  sirnaforge workflow TP53 --output-dir results

# Local
uv run sirnaforge workflow TP53 --output-dir results

What you get:

  • Transcript sequences from Ensembl
  • Thermodynamically-scored siRNA candidates
  • Off-target analysis (Docker only)
  • Ranked results in CSV and FASTA formats
  • Automatic Ensembl transcriptome indexing across human, mouse, rat, and rhesus macaque (override with --transcriptome-fasta, or supply design-ready transcripts via --input-fasta)
  • A reference_summary block in logs/workflow_summary.json that records whether each reference was explicit, defaulted, or disabled

Need more control? Customize with parameters:

sirnaforge workflow BRCA1 \
  --genome-species "human,rat,rhesus" \
  --gc-min 40 --gc-max 60 \
  --top-n 50 \
  --design-mode mirna \
  --output-dir results

Custom inputs & offline mode

Bring your own transcript sequences while still running the full workflow:

# Design from bundled sample FASTA (design-only mode, no transcriptome off-target)
sirnaforge workflow TP53 \
  --input-fasta examples/sample_transcripts.fasta \
  --output-dir custom_inputs_demo

# Design from bundled sample FASTA and align against mouse transcriptome
sirnaforge workflow TP53 \
  --input-fasta examples/sample_transcripts.fasta \
  --transcriptome-fasta ensembl_mouse_cdna \
  --output-dir custom_inputs_demo

# Remote FASTA sources also work
sirnaforge workflow BRCA1 \
  --input-fasta https://example.org/custom/brca1.fasta \
  --transcriptome-fasta /data/reference/ensembl_human_cdna_111.fasta

--input-fasta skips the gene search stage and designs directly from your sequences. When used alone, transcriptome off-target analysis is disabled (design-only mode). To enable transcriptome off-target with custom inputs, explicitly provide --transcriptome-fasta.

When --transcriptome-fasta is omitted the workflow automatically indexes the bundled Ensembl cDNA transcriptomes for human, mouse, rat, and macaque so multi-species off-target analysis runs out of the box.

Every workflow run now captures the resolved transcriptome decision in logs/workflow_summary.json under reference_summary.transcriptome, indicating whether the reference was auto-selected, explicitly supplied, or intentionally disabled. This makes it easier to audit production runs and confirm that default references were applied as expected.

📖 Usage examples and workflows → 📖 Complete CLI reference →


📚 Documentation

🎯 For Users

🔧 For Developers

📘 Browse full documentation →

Use sirnaforge --help, sirnaforge workflow --help, or the detailed CLI reference.


🎯 Use Cases

🧬 Basic Gene Silencing

sirnaforge workflow EGFR --output-dir egfr_analysis

Design siRNAs for a single target gene with default parameters.

🔬 Multi-Species Validation

# Accepts any species format - auto-normalizes to canonical names
sirnaforge workflow TP53 --species "human,rat,macaque"
# Also works: --species "hsa,rno,mml" or --species "Homo sapiens,Rattus norvegicus,Macaca mulatta"

Check off-target potential across multiple model organisms.

🧪 miRNA Seed Avoidance

# Species parameter drives both transcriptome and miRNA analysis
sirnaforge workflow BRCA1 --species "human,mouse"
# Override miRNA species independently if needed: --mirna-species "hsa,mmu,rno"

Filter candidates that match microRNA seed regions to reduce off-target effects.

⚙️ High-Throughput Analysis

# Batch multiple genes (off-target step uses the embedded Nextflow pipeline)
for gene in TP53 BRCA1 EGFR KRAS; do
  sirnaforge workflow "$gene" --output-dir "batch_results/$gene"
done

Process many genes in batch while reusing the same embedded Nextflow off-target engine.

💊 Chemical Modifications

sirnaforge workflow KRAS --modification-file examples/modification_patterns/fda_approved_onpattro.json

Track and apply FDA-approved modification patterns.

📖 More examples and tutorials →


🏗️ Architecture

siRNAforge implements a modular pipeline designed for both interactive use and high-throughput automation:

Gene Symbol → Transcript Retrieval → siRNA Design → Off-target Analysis → Ranked Candidates

Core Components:

  • Gene Search — Multi-database transcript retrieval (Ensembl, RefSeq, GENCODE)
  • Design Engine — Thermodynamic scoring with ViennaRNA integration
  • Off-target Analysis — BWA-MEM2 genome-wide alignment
  • Nextflow Pipeline — Scalable containerized execution

📖 Architecture documentation →


🔬 System Requirements

Docker Environment (Recommended)

All dependencies included in the image:

  • Nextflow ≥25.04.0
  • BWA-MEM2 ≥2.2.1
  • SAMtools ≥1.19.2
  • ViennaRNA ≥2.7.0
  • Python 3.10-3.12

Local Development

Python-only features work immediately. Off-target analysis requires Docker or manual installation of bioinformatics tools.

📖 Dependency details →


🤝 Contributing

We welcome contributions! siRNAforge uses modern Python tooling with make workflows for efficient development.

Essential Make Commands

🧪 Testing (By Tier)

make test-dev        # Fast unit tests (~15s) - for development iteration
make test-ci         # Smoke tests for CI/CD with coverage reports
make test-release    # Comprehensive validation (all tests + coverage)
make test            # All tests (shows passes/skips/fails)

🧪 Testing (By Requirement)

make test-requires-docker   # Tests requiring Docker daemon
make test-requires-network  # Tests requiring network access
make test-requires-nextflow # Tests requiring Nextflow

🔧 Code Quality

make lint       # Check code quality (ruff check + mypy)
make format     # Auto-format and autofix style issues (ruff)
make check      # format + test-dev (mutating quick validation)
make pre-commit # Run all pre-commit hooks locally
make security   # Run bandit + safety scans

🐳 Docker

make docker-build    # Build Docker image
make docker-test     # Run tests INSIDE container
make docker-shell    # Interactive shell in container
make docker-run      # Run workflow (e.g., make docker-run GENE=TP53)
make docker-build-test # Clean, rebuild, and validate Docker image

📚 Documentation

make docs        # Build HTML documentation
make docs-serve  # Serve docs locally at localhost:8000

🔧 Utilities

make clean       # Clean build artifacts and caches
make version     # Show current version
make example     # Run the sample workflow on bundled transcripts
make cache-info  # Inspect local transcript/miRNA cache mounts
make help        # Show all Make targets with descriptions

📖 Complete development guide → 📖 Contributing guidelines → 📖 Testing strategies →


📄 License

This project is licensed under the MIT License. See LICENSE for details.


📞 Support & Community


🙏 Acknowledgments

siRNAforge integrates several open-source bioinformatics tools:

Built with ❤️ for the research community
Portions developed with AI assistance • Reviewed and validated by human developers

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sirnaforge-0.4.1.tar.gz (392.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sirnaforge-0.4.1-py3-none-any.whl (211.0 kB view details)

Uploaded Python 3

File details

Details for the file sirnaforge-0.4.1.tar.gz.

File metadata

  • Download URL: sirnaforge-0.4.1.tar.gz
  • Upload date:
  • Size: 392.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sirnaforge-0.4.1.tar.gz
Algorithm Hash digest
SHA256 2c45dcde2d33d08d9d263df5de44dc61abbf82190138c47c0f77edce944f5905
MD5 26fb58e8ae898a5a8322a2a9da19e6c3
BLAKE2b-256 d6f3d88d16119275dee8e9dc329b89c72ea64550e15f549a8b547b7fe219dad6

See more details on using hashes here.

Provenance

The following attestation bundles were made for sirnaforge-0.4.1.tar.gz:

Publisher: release.yml on Austin-s-h/sirnaforge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sirnaforge-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: sirnaforge-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 211.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sirnaforge-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 81baa8ec0f2c47635aa1c7118b3a03a10ace4c00a678682ae6d6eb2490428dbf
MD5 acaa0050df1eeaf8b78edec52619667c
BLAKE2b-256 66fbb7304e9486d33038fddd69f4a66a60539d298d6cf2cb712b5817030080d0

See more details on using hashes here.

Provenance

The following attestation bundles were made for sirnaforge-0.4.1-py3-none-any.whl:

Publisher: release.yml on Austin-s-h/sirnaforge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page