siRNAforge - Multi-species gene to siRNA design, off-target prediction, and ranking. Comprehensive siRNA design toolkit for gene silencing
Project description
🧬 What is siRNAforge?
siRNAforge is a production-ready tool for designing small interfering RNAs (siRNAs) with integrated multi-species off-target analysis. Built for researchers who need reliable, high-specificity gene silencing candidates.
Why siRNAforge?
- 🎯 End-to-end workflow — From gene symbol to ranked candidates in one command
- 🔬 Multi-species validation — Off-target analysis of transcriptome and miRNA seed matches across human, rat, and rhesus macaque genomes
- 🐍 Developer-friendly — Modern Python API with full type hints and Pydantic models. Easily extend with your own scoring methods.
Key Features
| Feature | Description |
|---|---|
| 🔍 Multi-database search | Automatic transcript retrieval from Ensembl, RefSeq (TODO), and GENCODE (TODO) |
| 🧬 Variant targeting | Design and rank candidates against specific genetic variants with population AF filtering |
| 🧾 Transcript annotations | Fetch transcript models/interval annotations via a provider layer (Ensembl REST-backed) |
| 🌡️ Thermodynamic scoring | ViennaRNA-based secondary structure prediction and stability analysis |
| 🎯 Transcriptome Off-target analysis | Transcriptome BWA-MEM2 transcriptome search with mismatch tolerance control |
| 🧬 miRNA seed avoidance | MirGeneDB, MirBase (TODO) BWA-MEM2 mirna_seed search for known matches to miRNA seed regions |
| 🔤 Smart species handling | Accepts any format (common names, miRBase codes, scientific names) — auto-normalizes to canonical |
| ⚙️ Nextflow pipeline | Scalable, containerized execution for high-throughput analysis |
| 💉 Chemical modifications | Track 2'-O-methyl, 2'-fluoro, and phosphorothioate patterns |
| 📊 Rich output | Structured CSV, FASTA, and JSON reports with comprehensive metadata |
Supported Python versions: 3.10, 3.11, 3.12 (Python 3.13+ pending ViennaRNA compatibility)
📦 Installation
Choose your path based on what you need to do:
Complete installation guide with troubleshooting →
-
Deploy / run from registry (no setup) — Pull the prebuilt image with all bio tools, Nextflow, and Java bundled.
docker pull ghcr.io/austin-s-h/sirnaforge:latest
-
Daily development (Python-only, fast) — Use uv + managed virtualenv; great for core code and unit tests. Heavy bio/Nextflow tests stay skipped unless you also have Docker/Java.
curl -LsSf https://astral.sh/uv/install.sh | sh git clone https://github.com/austin-s-h/sirnaforge && cd sirnaforge make dev make check
-
Complete local testing (matches CI) — Either
- Build and test in Docker (reuses the bundled tools):
make docker-build-test - Or use conda to get bio deps + Java locally, then run the full suite:
conda env create -f environment-dev.yml conda activate sirnaforge make test-release
(Nextflow/Java are required for Nextflow-marked tests; Docker is required for container-marked tests.)
- Build and test in Docker (reuses the bundled tools):
🚀 Quick Start
Get your first results in 30 seconds:
# Docker
docker run -v $(pwd):/workspace -w /workspace \
ghcr.io/austin-s-h/sirnaforge:latest \
sirnaforge workflow TP53 --output-dir results
# Local
uv run sirnaforge workflow TP53 --output-dir results
What you get:
- Transcript sequences from Ensembl
- Thermodynamically-scored siRNA candidates
- Off-target analysis (Docker only)
- Ranked results in CSV and FASTA formats
- Automatic Ensembl transcriptome indexing across human, mouse, rat, and rhesus macaque (override with
--transcriptome-fasta, or supply design-ready transcripts via--input-fasta) - A
reference_summaryblock inlogs/workflow_summary.jsonthat records whether each reference was explicit, defaulted, or disabled
Need more control? Customize with parameters:
sirnaforge workflow BRCA1 \
--genome-species "human,rat,rhesus" \
--gc-min 40 --gc-max 60 \
--top-n 50 \
--design-mode mirna \
--output-dir results
Custom inputs & offline mode
Bring your own transcript sequences while still running the full workflow:
# Design from bundled sample FASTA (design-only mode, no transcriptome off-target)
sirnaforge workflow TP53 \
--input-fasta examples/sample_transcripts.fasta \
--output-dir custom_inputs_demo
# Design from bundled sample FASTA and align against mouse transcriptome
sirnaforge workflow TP53 \
--input-fasta examples/sample_transcripts.fasta \
--transcriptome-fasta ensembl_mouse_cdna \
--output-dir custom_inputs_demo
# Remote FASTA sources also work
sirnaforge workflow BRCA1 \
--input-fasta https://example.org/custom/brca1.fasta \
--transcriptome-fasta /data/reference/ensembl_human_cdna_111.fasta
--input-fasta skips the gene search stage and designs directly from your sequences. When used alone, transcriptome off-target analysis is disabled (design-only mode). To enable transcriptome off-target with custom inputs, explicitly provide --transcriptome-fasta.
When --transcriptome-fasta is omitted the workflow automatically indexes the bundled Ensembl cDNA transcriptomes for human, mouse, rat, and macaque so multi-species off-target analysis runs out of the box.
Every workflow run now captures the resolved transcriptome decision in logs/workflow_summary.json under reference_summary.transcriptome, indicating whether the reference was auto-selected, explicitly supplied, or intentionally disabled. This makes it easier to audit production runs and confirm that default references were applied as expected.
📖 Usage examples and workflows → 📖 Complete CLI reference →
📚 Documentation
🎯 For Users
|
🔧 For Developers
|
Use sirnaforge --help, sirnaforge workflow --help, or the detailed CLI reference.
🎯 Use Cases
🧬 Basic Gene Silencing
sirnaforge workflow EGFR --output-dir egfr_analysis
Design siRNAs for a single target gene with default parameters.
🔬 Multi-Species Validation
# Accepts any species format - auto-normalizes to canonical names
sirnaforge workflow TP53 --species "human,rat,macaque"
# Also works: --species "hsa,rno,mml" or --species "Homo sapiens,Rattus norvegicus,Macaca mulatta"
Check off-target potential across multiple model organisms.
🧪 miRNA Seed Avoidance
# Species parameter drives both transcriptome and miRNA analysis
sirnaforge workflow BRCA1 --species "human,mouse"
# Override miRNA species independently if needed: --mirna-species "hsa,mmu,rno"
Filter candidates that match microRNA seed regions to reduce off-target effects.
⚙️ High-Throughput Analysis
# Batch multiple genes (off-target step uses the embedded Nextflow pipeline)
for gene in TP53 BRCA1 EGFR KRAS; do
sirnaforge workflow "$gene" --output-dir "batch_results/$gene"
done
Process many genes in batch while reusing the same embedded Nextflow off-target engine.
💊 Chemical Modifications
sirnaforge workflow KRAS --modification-file examples/modification_patterns/fda_approved_onpattro.json
Track and apply FDA-approved modification patterns.
📖 More examples and tutorials →
🏗️ Architecture
siRNAforge implements a modular pipeline designed for both interactive use and high-throughput automation:
Gene Symbol → Transcript Retrieval → siRNA Design → Off-target Analysis → Ranked Candidates
Core Components:
- Gene Search — Multi-database transcript retrieval (Ensembl, RefSeq, GENCODE)
- Design Engine — Thermodynamic scoring with ViennaRNA integration
- Off-target Analysis — BWA-MEM2 genome-wide alignment
- Nextflow Pipeline — Scalable containerized execution
📖 Architecture documentation →
🔬 System Requirements
Docker Environment (Recommended)
All dependencies included in the image:
- Nextflow ≥25.04.0
- BWA-MEM2 ≥2.2.1
- SAMtools ≥1.19.2
- ViennaRNA ≥2.7.0
- Python 3.10-3.12
Local Development
Python-only features work immediately. Off-target analysis requires Docker or manual installation of bioinformatics tools.
🤝 Contributing
We welcome contributions! siRNAforge uses modern Python tooling with make workflows for efficient development.
Essential Make Commands
🧪 Testing (By Tier)
make test-dev # Fast unit tests (~15s) - for development iteration
make test-ci # Smoke tests for CI/CD with coverage reports
make test-release # Comprehensive validation (all tests + coverage)
make test # All tests (shows passes/skips/fails)
🧪 Testing (By Requirement)
make test-requires-docker # Tests requiring Docker daemon
make test-requires-network # Tests requiring network access
make test-requires-nextflow # Tests requiring Nextflow
🔧 Code Quality
make lint # Check code quality (ruff check + mypy)
make format # Auto-format and autofix style issues (ruff)
make check # format + test-dev (mutating quick validation)
make pre-commit # Run all pre-commit hooks locally
make security # Run bandit + safety scans
🐳 Docker
make docker-build # Build Docker image
make docker-test # Run tests INSIDE container
make docker-shell # Interactive shell in container
make docker-run # Run workflow (e.g., make docker-run GENE=TP53)
make docker-build-test # Clean, rebuild, and validate Docker image
📚 Documentation
make docs # Build HTML documentation
make docs-serve # Serve docs locally at localhost:8000
🔧 Utilities
make clean # Clean build artifacts and caches
make version # Show current version
make example # Run the sample workflow on bundled transcripts
make cache-info # Inspect local transcript/miRNA cache mounts
make help # Show all Make targets with descriptions
📖 Complete development guide → 📖 Contributing guidelines → 📖 Testing strategies →
📄 License
This project is licensed under the MIT License. See LICENSE for details.
📞 Support & Community
- 🐛 Bug Reports — GitHub Issues
- 📖 Documentation — austin-s-h.github.io/sirnaforge
- 💬 Questions — GitHub Discussions
- 📝 Changelog — CHANGELOG.md
🙏 Acknowledgments
siRNAforge integrates several open-source bioinformatics tools:
- ViennaRNA Package — RNA secondary structure prediction
- BWA-MEM2 — High-performance sequence alignment
- Nextflow — Scalable workflow orchestration
- BioPython — Computational biology utilities
Portions developed with AI assistance • Reviewed and validated by human developers
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sirnaforge-0.4.1.tar.gz.
File metadata
- Download URL: sirnaforge-0.4.1.tar.gz
- Upload date:
- Size: 392.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2c45dcde2d33d08d9d263df5de44dc61abbf82190138c47c0f77edce944f5905
|
|
| MD5 |
26fb58e8ae898a5a8322a2a9da19e6c3
|
|
| BLAKE2b-256 |
d6f3d88d16119275dee8e9dc329b89c72ea64550e15f549a8b547b7fe219dad6
|
Provenance
The following attestation bundles were made for sirnaforge-0.4.1.tar.gz:
Publisher:
release.yml on Austin-s-h/sirnaforge
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sirnaforge-0.4.1.tar.gz -
Subject digest:
2c45dcde2d33d08d9d263df5de44dc61abbf82190138c47c0f77edce944f5905 - Sigstore transparency entry: 813454373
- Sigstore integration time:
-
Permalink:
Austin-s-h/sirnaforge@8419c981ad8d9556da6426ce12a4cbe0c0cedd9f -
Branch / Tag:
refs/tags/v0.4.1 - Owner: https://github.com/Austin-s-h
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@8419c981ad8d9556da6426ce12a4cbe0c0cedd9f -
Trigger Event:
push
-
Statement type:
File details
Details for the file sirnaforge-0.4.1-py3-none-any.whl.
File metadata
- Download URL: sirnaforge-0.4.1-py3-none-any.whl
- Upload date:
- Size: 211.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
81baa8ec0f2c47635aa1c7118b3a03a10ace4c00a678682ae6d6eb2490428dbf
|
|
| MD5 |
acaa0050df1eeaf8b78edec52619667c
|
|
| BLAKE2b-256 |
66fbb7304e9486d33038fddd69f4a66a60539d298d6cf2cb712b5817030080d0
|
Provenance
The following attestation bundles were made for sirnaforge-0.4.1-py3-none-any.whl:
Publisher:
release.yml on Austin-s-h/sirnaforge
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sirnaforge-0.4.1-py3-none-any.whl -
Subject digest:
81baa8ec0f2c47635aa1c7118b3a03a10ace4c00a678682ae6d6eb2490428dbf - Sigstore transparency entry: 813454374
- Sigstore integration time:
-
Permalink:
Austin-s-h/sirnaforge@8419c981ad8d9556da6426ce12a4cbe0c0cedd9f -
Branch / Tag:
refs/tags/v0.4.1 - Owner: https://github.com/Austin-s-h
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@8419c981ad8d9556da6426ce12a4cbe0c0cedd9f -
Trigger Event:
push
-
Statement type: