Skip to main content

siRNAforge - Multi-species gene to siRNA design, off-target prediction, and ranking. Comprehensive siRNA design toolkit for gene silencing

Project description

๐Ÿงฌ siRNAforge โ€” Comprehensive siRNA Design Tool

siRNAforge Logo

Multi-species gene to siRNA design, off-target prediction, and ranking ๐Ÿš€ Release siRNAforge Python 3.9โ€“3.12 uv Code style: black Ruff Docker Nextflow Tests Coverage MIT License

siRNAforge is a modern, comprehensive toolkit for designing high-quality siRNAs with integrated off-target analysis. Built with Python 3.9-3.12, it combines cutting-edge bioinformatics algorithms with robust software engineering practices to provide a complete gene silencing solution for researchers and biotechnology applications.

โœจ Key Features

  • ๐ŸŽฏ Algorithm-driven design - Comprehensive siRNA design with multi-component thermodynamic scoring
  • ๐Ÿ” Multi-species off-target analysis - BWA-MEM2 and Bowtie alignment across human, rat, rhesus genomes
  • ๐Ÿ“Š Advanced scoring system - Composite scoring with seed-region specificity and secondary structure prediction
  • ๐Ÿงช ViennaRNA integration - Secondary structure prediction for enhanced design accuracy
  • ๐Ÿ”ฌ Nextflow pipeline integration - Scalable, containerized workflow execution with automatic parallelization
  • ๐Ÿ Modern Python architecture - Type-safe code with Pydantic models, async/await support, and rich CLI
  • โšก Lightning-fast dependency management - Built with uv for sub-second installs and virtual environment management
  • ๐Ÿณ Fully containerized - Docker images with all bioinformatics dependencies pre-installed
  • ๐Ÿงฌ Multi-database support - Ensembl, RefSeq, GENCODE integration for comprehensive transcript retrieval

Note: Supports Python 3.9-3.12. Python 3.13+ not yet supported due to ViennaRNA dependency constraints.

๐Ÿš€ Quick Start

Installation Options

๐Ÿณ Docker (Recommended - Complete Environment):

# Pull the pre-built image with all dependencies
docker pull ghcr.io/austin-s-h/sirnaforge:latest

# Quick workflow example
docker run -v $(pwd):/workspace -w /workspace \
  ghcr.io/austin-s-h/sirnaforge:latest \
  sirnaforge workflow TP53 --output-dir results --genome-species human

# With custom parameters
docker run -v $(pwd):/workspace -w /workspace \
  ghcr.io/austin-s-h/sirnaforge:latest \
  sirnaforge workflow BRCA1 --gc-min 40 --gc-max 60 --sirna-length 21 --top-n 50

๐Ÿ Conda Environment (Alternative - Local Development):

# Install micromamba (recommended - fastest), Mambaforge, or Miniconda
# micromamba (fastest option):
curl -LsSf https://micro.mamba.pm/install.sh | bash

# Or Mambaforge:
curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh"
bash Mambaforge-$(uname)-$(uname -m).sh

# Create siRNAforge development environment
make conda-env

# Activate the environment
micromamba activate sirnaforge-dev  # or conda activate sirnaforge-dev

# Install Python dependencies
make install-dev

# Run tests to verify installation
make test

๏ฟฝ Local Development Installation:

# Install uv (lightning-fast Python package manager)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone and setup with development dependencies
git clone https://github.com/austin-s-h/sirnaforge
cd sirnaforge
make install-dev

# Run tests to verify installation
make test

Essential Dependencies for Off-target Analysis

The Docker image includes all bioinformatics dependencies via conda environment (docker/environment-nextflow.yml):

  • โœ… Nextflow (โ‰ฅ25.04.0) - Workflow orchestration and parallelization
  • โœ… BWA-MEM2 (โ‰ฅ2.2.1) - High-performance genome alignment
  • โœ… Bowtie (โ‰ฅ1.3.1) - miRNA seed-region analysis
  • โœ… SAMtools (โ‰ฅ1.19.2) - SAM/BAM file processing and indexing
  • โœ… ViennaRNA (โ‰ฅ2.7.0) - RNA secondary structure prediction
  • โœ… AWS CLI (โ‰ฅ2.0) - Automated genome reference downloads
  • โœ… Java 17 - Nextflow runtime environment

For local development without Docker:

# Option 1: Use conda environment (includes all tools)
make conda-env
micromamba activate sirnaforge-dev  # or conda activate sirnaforge-dev

# Option 2: Install bioinformatics tools via micromamba
curl -LsSf https://micro.mamba.pm/install.sh | bash
micromamba env create -f docker/environment-nextflow.yml
micromamba activate sirnaforge-env

Usage Examples

๐ŸŽฏ Complete Workflow (Gene Query to Results):

# Basic workflow with default parameters
uv run sirnaforge workflow TP53 --output-dir results

# Advanced workflow with custom parameters
uv run sirnaforge workflow BRCA1 \
  --genome-species "human,rat,rhesus" \
  --gc-min 40 --gc-max 60 \
  --sirna-length 21 \
  --top-n 50 \
  --output-dir brca1_analysis

# Workflow from pre-existing FASTA file
uv run sirnaforge workflow --input-fasta transcripts.fasta \
  --output-dir custom_analysis \
  --offtarget-n 25 \
  custom_gene_name

๐Ÿ” Individual Component Usage:

# Search for gene transcripts across databases
uv run sirnaforge search TP53 --output transcripts.fasta --database ensembl

# Design siRNAs from transcript sequences
uv run sirnaforge design transcripts.fasta --output results.csv --top-n 20

# Validate input files before processing
uv run sirnaforge validate candidates.fasta

# Display configuration and system information
uv run sirnaforge config

# Show detailed help for any command
uv run sirnaforge --help
uv run sirnaforge workflow --help

Python API

๐Ÿ”ง Programmatic Access for Custom Workflows:

import asyncio
from pathlib import Path
from sirnaforge.workflow import run_sirna_workflow
from sirnaforge.core.design import SiRNADesigner
from sirnaforge.models.sirna import DesignParameters, FilterCriteria
from sirnaforge.data.gene_search import search_gene_sync

# Complete async workflow with custom parameters
async def design_sirnas_custom():
    results = await run_sirna_workflow(
        gene_query="TP53",
        output_dir="results",
        database="ensembl",
        top_n_candidates=50,
        top_n_offtarget=15,
        genome_species=["human", "rat", "rhesus"],
        gc_min=40.0,
        gc_max=60.0,
        sirna_length=21,
    )
    return results

# Run the workflow
results = asyncio.run(design_sirnas_custom())
print(f"โœ… Designed {len(results.get('top_candidates', []))} siRNA candidates")

# Individual component usage for custom pipelines
def custom_design_pipeline():
    # 1. Search for gene transcripts
    transcripts = search_gene_sync(
        gene_query="BRCA1",
        database="ensembl",
        output_file="transcripts.fasta"
    )

    # 2. Configure design parameters
    design_params = DesignParameters(
        sirna_length=21,
        filters=FilterCriteria(
            gc_min=40,
            gc_max=60,
            avoid_patterns=["AAAA", "TTTT", "GGGG", "CCCC"]
        )
    )

    # 3. Initialize designer and generate candidates
    designer = SiRNADesigner(design_params)
    design_results = designer.design_from_file("transcripts.fasta")

    # 4. Process results
    for candidate in design_results.top_candidates[:10]:
        print(f"Candidate {candidate.id}:")
        print(f"  Guide: {candidate.guide_sequence}")
        print(f"  Score: {candidate.composite_score:.2f}")
        print(f"  GC%: {candidate.gc_content:.1f}")
        print(f"  Transcripts: {len(candidate.transcript_ids)}")
        print()

    return design_results

# Example: Batch processing multiple genes
async def batch_design_genes(genes: list[str]):
    results = {}
    for gene in genes:
        print(f"Processing {gene}...")
        gene_results = await run_sirna_workflow(
            gene_query=gene,
            output_dir=f"results_{gene.lower()}",
            top_n_candidates=20
        )
        results[gene] = gene_results
    return results

# Process multiple cancer-related genes
cancer_genes = ["TP53", "BRCA1", "BRCA2", "EGFR", "MYC"]
batch_results = asyncio.run(batch_design_genes(cancer_genes))

๐Ÿ—๏ธ Architecture & Workflow

Complete Pipeline Overview

Gene Query โ†’ Transcript Search โ†’ ORF Validation โ†’ siRNA Design โ†’ Off-target Analysis โ†’ Ranked Results
     โ†“              โ†“                โ†“               โ†“               โ†“                    โ†“
Multi-database   Canonical       Coding Frame   Thermodynamic   Multi-species BWA    Scored & Filtered
Gene Search      Isoform         Validation     + Structure     Alignment +          siRNA Candidates
(Ensembl/        Selection                      Scoring         Bowtie miRNA         with Off-target
RefSeq/GENCODE)                                                Analysis             Predictions

Core Components

๐Ÿ” Gene Search & Data Layer (sirnaforge.data.*)

  • Multi-database integration: Ensembl, RefSeq, GENCODE APIs with automatic fallback
  • Canonical transcript selection: Prioritizes protein-coding, longest transcripts
  • Robust error handling: Network timeouts, API rate limiting, malformed responses
  • Async/await support: Non-blocking I/O for improved performance

๐Ÿงฌ ORF Analysis (sirnaforge.data.orf_analysis)

  • Reading frame validation: Ensures proper coding sequence targeting
  • Quality control reporting: Detailed validation logs and metrics
  • Multi-transcript support: Handles gene isoforms and splice variants

๐ŸŽฏ siRNA Design Engine (sirnaforge.core.design)

  • Algorithm-based candidate generation: Systematic 19-23 nucleotide window scanning
  • Multi-component scoring system:
    • Thermodynamic properties: GC content (30-60%), melting temperature optimization
    • Secondary structure prediction: ViennaRNA integration for accessibility scoring
    • Position-specific penalties: 5' and 3' end optimization
    • Off-target risk assessment: Simplified seed-region analysis
  • Composite scoring: Weighted combination of all scoring components
  • Transcript consolidation: Deduplicates guide sequences across multiple transcript isoforms

๐Ÿ” Off-target Analysis (sirnaforge.core.off_target)

  • Dual-engine approach:
    • BWA-MEM2: Sensitive genome-wide alignment for transcriptome off-targets
    • Bowtie: High-specificity miRNA seed-region analysis (positions 2-8)
  • Multi-species support: Human, rat, rhesus macaque genome analysis
  • Advanced scoring: Position-weighted mismatch penalties with seed-region emphasis
  • Scalable processing: Batch candidate analysis with parallel execution

๐Ÿ”ฌ Nextflow Pipeline Integration (nextflow_pipeline/)

  • Containerized execution: Docker/Singularity support with pre-built environments
  • Automatic resource management: Dynamic CPU/memory allocation based on workload
  • Cloud-ready: AWS S3 genome reference integration with automatic downloading
  • Fault tolerance: Resume capability and error recovery mechanisms
  • Parallel processing: Multi-genome, multi-candidate simultaneous analysis

โšก Modern Python Architecture

  • Type safety: Full mypy compliance with Pydantic models for data validation
  • Async/await: Non-blocking I/O throughout the pipeline for improved throughput
  • Rich CLI: Beautiful terminal interface with progress bars, tables, and error formatting
  • Comprehensive testing: Unit, integration, and pipeline tests with pytest
  • Developer experience: Pre-commit hooks, automated formatting (black), linting (ruff)

Repository Structure

sirnaforge/
โ”œโ”€โ”€ ๐Ÿ“ฆ src/sirnaforge/              # Main package (modern src-layout)
โ”‚   โ”œโ”€โ”€ ๐ŸŽฏ core/                   # Core algorithms and analysis engines
โ”‚   โ”‚   โ”œโ”€โ”€ design.py              # siRNA design, scoring, and candidate generation
โ”‚   โ”‚   โ”œโ”€โ”€ off_target.py          # BWA-MEM2/Bowtie off-target analysis
โ”‚   โ”‚   โ””โ”€โ”€ thermodynamics.py     # ViennaRNA integration & structure prediction
โ”‚   โ”œโ”€โ”€ ๐Ÿ“Š models/                 # Type-safe Pydantic data models
โ”‚   โ”‚   โ”œโ”€โ”€ sirna.py              # siRNA candidates, parameters, results
โ”‚   โ”‚   โ””โ”€โ”€ transcript.py         # Transcript and gene representations
โ”‚   โ”œโ”€โ”€ ๐Ÿ’พ data/                   # Data access and integration layer
โ”‚   โ”‚   โ”œโ”€โ”€ gene_search.py        # Multi-database API integration
โ”‚   โ”‚   โ”œโ”€โ”€ orf_analysis.py       # Reading frame and coding validation
โ”‚   โ”‚   โ””โ”€โ”€ base.py               # Common utilities (FASTA parsing, etc.)
โ”‚   โ”œโ”€โ”€ ๐Ÿ”ง pipeline/               # Nextflow workflow integration
โ”‚   โ”‚   โ”œโ”€โ”€ nextflow/             # Nextflow execution and config management
โ”‚   โ”‚   โ””โ”€โ”€ resources.py          # Resource and test data management
โ”‚   โ”œโ”€โ”€ ๐Ÿ› ๏ธ utils/                  # Cross-cutting utilities
โ”‚   โ”‚   โ””โ”€โ”€ logging_utils.py      # Structured logging configuration
โ”‚   โ”œโ”€โ”€ ๐Ÿ“Ÿ cli.py                  # Rich CLI interface with Typer
โ”‚   โ””โ”€โ”€ workflow.py               # High-level workflow orchestration
โ”œโ”€โ”€ ๐Ÿงช tests/                      # Comprehensive test suite
โ”‚   โ”œโ”€โ”€ unit/                     # Component-specific unit tests
โ”‚   โ”œโ”€โ”€ integration/              # Cross-component integration tests
โ”‚   โ”œโ”€โ”€ pipeline/                 # Nextflow pipeline validation tests
โ”‚   โ””โ”€โ”€ docker/                   # Container integration tests
โ”œโ”€โ”€ ๐ŸŒŠ nextflow_pipeline/          # Nextflow DSL2 workflow
โ”‚   โ”œโ”€โ”€ main.nf                   # Main workflow orchestration
โ”‚   โ”œโ”€โ”€ nextflow.config           # Execution and resource configuration
โ”‚   โ”œโ”€โ”€ modules/local/            # Custom process definitions
โ”‚   โ””โ”€โ”€ subworkflows/local/       # Reusable workflow components
โ”œโ”€โ”€ ๐Ÿณ docker/                     # Container definitions and environments
โ”‚   โ”œโ”€โ”€ Dockerfile                # Multi-stage production image
โ”‚   โ””โ”€โ”€ environment-nextflow.yml  # Conda environment specification
โ”œโ”€โ”€ ๐Ÿ“š docs/                       # Documentation and examples
โ”‚   โ”œโ”€โ”€ api_reference.rst         # API documentation
โ”‚   โ”œโ”€โ”€ tutorials/                # Step-by-step guides
โ”‚   โ””โ”€โ”€ examples/                 # Working code examples
โ””โ”€โ”€ ๐Ÿ”ง Configuration files
    โ”œโ”€โ”€ pyproject.toml            # Python packaging and tool configuration
    โ”œโ”€โ”€ Makefile                  # Development workflow automation
    โ””โ”€โ”€ uv.lock                   # Reproducible dependency resolution

Repository Structure

sirnaforge/
โ”œโ”€โ”€ ๐Ÿ“ฆ src/sirnaforge/              # Main package (modern src-layout)
โ”‚   โ”œโ”€โ”€ ๐ŸŽฏ core/                   # Core algorithms
โ”‚   โ”‚   โ”œโ”€โ”€ design.py              # siRNA design and scoring
โ”‚   โ”‚   โ”œโ”€โ”€ off_target.py          # Off-target analysis
โ”‚   โ”‚   โ””โ”€โ”€ scoring.py             # Scoring algorithms
โ”‚   โ”œโ”€โ”€ ๐Ÿ“Š models/                 # Pydantic data models
โ”‚   โ”‚   โ”œโ”€โ”€ sirna.py              # siRNA candidate models
โ”‚   โ”‚   โ””โ”€โ”€ transcript.py         # Transcript models
โ”‚   โ”œโ”€โ”€ ๐Ÿ’พ data/                   # Data access layer
โ”‚   โ”‚   โ”œโ”€โ”€ gene_search.py        # Multi-database gene search
โ”‚   โ”‚   โ”œโ”€โ”€ orf_analysis.py       # ORF validation
โ”‚   โ”‚   โ””โ”€โ”€ base.py               # Common data utilities
โ”‚   โ”œโ”€โ”€ ๐Ÿ”ง pipeline/               # Nextflow integration
โ”‚   โ”œโ”€โ”€ ๐Ÿ› ๏ธ utils/                  # Helper utilities
โ”‚   โ”œโ”€โ”€ ๐Ÿ“Ÿ cli.py                  # Rich CLI interface
โ”‚   โ””โ”€โ”€ workflow.py               # Orchestration logic
โ”œโ”€โ”€ ๐Ÿงช tests/                      # Comprehensive test suite
โ”‚   โ”œโ”€โ”€ unit/                     # Unit tests
โ”‚   โ”œโ”€โ”€ integration/              # Integration tests
โ”‚   โ””โ”€โ”€ pipeline/                 # Pipeline tests
โ”œโ”€โ”€ ๐ŸŒŠ nextflow_pipeline/          # Nextflow workflow
โ”‚   โ”œโ”€โ”€ main.nf                   # Main workflow
โ”‚   โ”œโ”€โ”€ nextflow.config           # Configuration
โ”‚   โ”œโ”€โ”€ genomes.yaml             # Genome specifications
โ”‚   โ””โ”€โ”€ modules/local/           # Process modules
โ”œโ”€โ”€ ๐Ÿณ docker/                     # Containerization
โ”‚   โ”œโ”€โ”€ Dockerfile               # Multi-stage build
โ”‚   โ””โ”€โ”€ environment-nextflow.yml # Conda environment
โ”œโ”€โ”€ ๐Ÿ“š docs/                       # Documentation
โ”œโ”€โ”€ ๐Ÿ“‹ examples/                   # Usage examples
โ”œโ”€โ”€ โš™๏ธ pyproject.toml              # Modern Python packaging
โ”œโ”€โ”€ ๐Ÿ”ง Makefile                    # Development commands
    โ””โ”€โ”€ uv.lock                   # Reproducible dependency resolution

๐Ÿ“Š Output Formats & Results


## ๏ฟฝ Output Formats & Results

siRNAforge generates comprehensive, structured outputs for downstream analysis and experimental validation:

### Workflow Output Structure

output_directory/ โ”œโ”€โ”€ ๐Ÿ“ transcripts/ # Retrieved transcript sequences โ”‚ โ”œโ”€โ”€ {gene}_transcripts.fasta # All retrieved transcript isoforms โ”‚ โ””โ”€โ”€ temp_for_design.fasta # Filtered sequences for design โ”œโ”€โ”€ ๐Ÿ“ orf_reports/ # Open reading frame validation โ”‚ โ””โ”€โ”€ {gene}_orf_validation.txt # Coding sequence quality report โ”œโ”€โ”€ ๐Ÿ“ sirnaforge/ # Core siRNA design results โ”‚ โ”œโ”€โ”€ {gene}_sirna_results.csv # Complete candidate table โ”‚ โ”œโ”€โ”€ {gene}_top_candidates.fasta # Top-ranked sequences for validation โ”‚ โ””โ”€โ”€ {gene}_candidate_summary.txt # Human-readable summary โ”œโ”€โ”€ ๐Ÿ“ off_target/ # Off-target analysis results โ”‚ โ”œโ”€โ”€ basic_analysis.json # Simplified off-target metrics โ”‚ โ”œโ”€โ”€ input_candidates.fasta # Candidates sent for analysis โ”‚ โ””โ”€โ”€ results/ # Detailed Nextflow pipeline outputs โ”‚ โ”œโ”€โ”€ aggregated/ # Combined multi-species results โ”‚ โ””โ”€โ”€ individual_results/ # Per-candidate detailed analysis โ”œโ”€โ”€ ๐Ÿ“„ workflow_manifest.json # Complete workflow configuration โ””โ”€โ”€ ๐Ÿ“„ workflow_summary.json # High-level results summary


### Key Output Files

**๐ŸŽฏ `{gene}_sirna_results.csv`** - Complete candidate table with all scoring metrics:
```csv
id,guide_sequence,antisense_sequence,transcript_ids,position,gc_content,melting_temp,thermodynamic_score,secondary_structure_score,off_target_score,composite_score
TP53_001,GUAACAUUUGAGCCUUCUGA,UCAGAAGGCUCAAAUGUUAC,"ENST00000269305;ENST00000455263",245,47.6,52.3,0.85,0.92,0.78,4.22
TP53_002,CAUCAACUGAUUGUGCUGC,GCAGCACAAUCAGUUGAUG,"ENST00000269305",512,52.6,54.1,0.91,0.88,0.82,4.45
...

๐Ÿงฌ {gene}_top_candidates.fasta - Ready-to-order sequences for experimental validation:

>TP53_001 score=4.22 gc=47.6% transcripts=2
GUAACAUUUGAGCCUUCUGA
>TP53_002 score=4.45 gc=52.6% transcripts=1
CAUCAACUGAUUGUGCUGC

๐Ÿ“‹ {gene}_candidate_summary.txt - Human-readable summary report:

siRNAforge Design Summary for TP53
Generated: 2025-09-08 14:30:22
=================================

Input Statistics:
- Transcripts processed: 3
- Total sequence length: 2,847 bp
- Coding sequences: 1,182 bp

Design Results:
- Candidates generated: 1,156
- Passed filters: 234
- Top candidates selected: 50

Top 5 Candidates:
1. TP53_001: GUAACAUUUGAGCCUUCUGA (Score: 4.22, GC: 47.6%)
2. TP53_002: CAUCAACUGAUUGUGCUGC (Score: 4.45, GC: 52.6%)
...

๐Ÿ” Off-target Analysis Outputs:

{
  "analysis_summary": {
    "candidates_analyzed": 10,
    "total_off_targets": 15,
    "high_confidence_hits": 3
  },
  "by_species": {
    "human": {"transcriptome_hits": 8, "mirna_hits": 2},
    "rat": {"transcriptome_hits": 3, "mirna_hits": 1},
    "rhesus": {"transcriptome_hits": 1, "mirna_hits": 0}
  },
  "candidates": [
    {
      "candidate_id": "TP53_001",
      "guide_sequence": "GUAACAUUUGAGCCUUCUGA",
      "off_target_score": 0.78,
      "species_analysis": {
        "human": {"hits": 5, "seed_matches": 2},
        "rat": {"hits": 2, "seed_matches": 0}
      }
    }
  ]
}

Integration with Analysis Tools

๐Ÿ”ฌ For Laboratory Validation:

  • FASTA files can be directly submitted to oligonucleotide synthesis providers
  • CSV files import into Excel/R/Python for further analysis
  • Candidate rankings support experimental prioritization

๐Ÿ–ฅ๏ธ For Computational Analysis:

  • JSON outputs enable programmatic result processing
  • Structured CSV format supports statistical analysis and machine learning
  • Off-target data facilitates safety assessment and regulatory compliance

๐Ÿ“Š For Visualization and Reporting:

  • Summary reports provide publication-ready candidate lists
  • Score distributions support quality control assessment
  • Multi-species comparisons enable cross-species research applications

๏ฟฝ๐Ÿ”ฌ Nextflow Pipeline Integration

The integrated Nextflow pipeline provides scalable, containerized off-target analysis:

Pipeline Features

  • Multi-Species Analysis - Human, rat, rhesus macaque genomes
  • Parallel Processing - Each siRNA candidate processed independently
  • Auto Index Management - Downloads and builds BWA indices on demand
  • Cloud Ready - AWS Batch, Kubernetes, SLURM support
  • Comprehensive Results - TSV, JSON, and HTML outputs

Usage Examples

# Standalone pipeline execution
nextflow run nextflow_pipeline/main.nf \
  --input candidates.fasta \
  --genome_species "human,rat,rhesus" \
  --outdir results

# With custom genome indices
nextflow run nextflow_pipeline/main.nf \
  --input candidates.fasta \
  --genome_indices "human:/path/to/human/index" \
  --profile docker

# Using S3-hosted indices
nextflow run nextflow_pipeline/main.nf \
  --input candidates.fasta \
  --download_indexes true \
  --profile aws

Pipeline Output Structure

results/
โ”œโ”€โ”€ aggregated/                    # Final combined results
โ”‚   โ”œโ”€โ”€ combined_mirna_analysis.tsv
โ”‚   โ”œโ”€โ”€ combined_transcriptome_analysis.tsv
โ”‚   โ”œโ”€โ”€ combined_summary.json
โ”‚   โ””โ”€โ”€ analysis_report.html
โ””โ”€โ”€ individual_results/            # Per-candidate results
    โ”œโ”€โ”€ candidate_0001/
    โ”œโ”€โ”€ candidate_0002/
    โ””โ”€โ”€ ...

๐Ÿ› ๏ธ Development & Quality Assurance

Modern Development Environment with uv

siRNAforge leverages uv for lightning-fast dependency management and development workflows:

# Complete development setup (recommended)
git clone https://github.com/austin-s-h/sirnaforge
cd sirnaforge
make install-dev  # Installs all dev dependencies

# Core development commands
make test         # Full test suite with coverage
make lint         # Code quality checks (mypy, ruff, black)
make test-fast    # Quick tests (unit tests only)
make docs-build   # Generate documentation
make docker-build # Build container locally

# Selective dependency installation
uv sync --group analysis    # Jupyter, plotting, pandas extras
uv sync --group pipeline    # Nextflow, Docker integration
uv sync --group docs        # Sphinx documentation tools
uv sync --group lint        # Pre-commit, mypy, ruff, black

# Production deployment (minimal dependencies)
uv sync --no-dev

Conda Environment Management

For local development with bioinformatics tools, siRNAforge provides conda environment management:

# Create complete development environment
make conda-env

# Update existing environment with new dependencies
make conda-env-update

# Remove environment (cleanup)
make conda-env-clean

# Activate environment for development
conda activate sirnaforge-dev

# Deactivate when done
conda deactivate

The conda environment includes all bioinformatics tools (BWA-MEM2, SAMtools, ViennaRNA, etc.) plus Python development dependencies, providing a complete local development setup without Docker.

Quality Assurance & Testing

๐Ÿงช Comprehensive Test Suite:

# Run all tests with coverage reporting
make test
# Output: >95% code coverage across all modules

# Fast development testing (unit tests only)
make test-fast

# Integration tests (includes external APIs)
uv run pytest tests/integration/ -v

# Pipeline tests (requires Docker/Nextflow)
uv run pytest tests/pipeline/ -v

# Specific test categories
uv run pytest tests/unit/test_design.py::test_scoring_algorithm -v

๐Ÿ” Code Quality Tools:

# Type checking with mypy (strict mode)
uv run mypy src/
# Result: Success: no issues found in 20 source files

# Code formatting with black
uv run black src tests
make format

# Linting with ruff (fast Python linter)
uv run ruff check src tests
make lint

# All quality checks together
make lint  # Includes ruff, black, mypy, nextflow lint

Available Dependency Groups

Group Purpose Key Tools
dev Core development (auto-installed) pytest, black, ruff
test Testing frameworks pytest-cov, pytest-xdist
lint Code quality mypy, ruff, black
analysis Data science workflows jupyter, matplotlib, pandas
pipeline Nextflow integration workflow tools, containers
docs Documentation generation sphinx, sphinx-rtd-theme

Code Quality Standards

  • Type Safety: Full mypy coverage with Pydantic models
  • Formatting: Black + Ruff for consistent style
  • Testing: Comprehensive pytest suite with >90% coverage
  • CI/CD: GitHub Actions with multi-Python testing
  • Security: Bandit + Safety dependency scanning

โšก Performance & System Requirements

Performance Benchmarks

๐Ÿงฌ siRNA Design Performance:

  • Small genes (1-5 transcripts): ~2-5 seconds
  • Medium genes (5-20 transcripts): ~10-30 seconds
  • Large genes (20+ transcripts): ~1-2 minutes
  • Batch processing (10 genes): ~5-15 minutes

๐Ÿ” Off-target Analysis Performance:

  • Per candidate (single species): ~30-60 seconds
  • Multi-species (3 genomes): ~2-5 minutes per candidate
  • Batch analysis (50 candidates): ~1-3 hours (parallelized)

System Requirements

๐Ÿ”ง Minimum Requirements:

  • CPU: 2 cores, 2.0 GHz
  • RAM: 4 GB (8 GB recommended for off-target analysis)
  • Storage: 2 GB free space (+ 50 GB for genome indices)
  • Network: Internet connection for gene searches and genome downloads

โšก Recommended Configuration:

  • CPU: 8+ cores, 3.0 GHz (for parallel Nextflow execution)
  • RAM: 16-32 GB (for large-scale off-target analysis)
  • Storage: SSD with 100+ GB (for genome indices and temporary files)
  • Network: High-bandwidth connection for S3 genome downloads

๐Ÿณ Docker Resource Allocation:

# Recommended Docker settings
docker run --cpus="4" --memory="8g" \
  -v $(pwd):/workspace -w /workspace \
  ghcr.io/austin-s-h/sirnaforge:latest \
  sirnaforge workflow TP53 --genome-species human,rat,rhesus

๐Ÿณ Docker Usage

Pre-built Images

# Pull latest stable release
docker pull ghcr.io/austin-s-h/sirnaforge:latest

# Run complete workflow
docker run --rm -v $(pwd):/data \
  ghcr.io/austin-s-h/sirnaforge:latest \
  sirnaforge workflow TP53 --output-dir /data/results

# Interactive development session
docker run -it --rm -v $(pwd):/data \
  ghcr.io/austin-s-h/sirnaforge:latest bash

Building Custom Images

# Build production image
make docker-build

# Build with specific Python version
docker build --build-arg PYTHON_VERSION=3.11 \
  -f docker/Dockerfile -t sirnaforge:py311 .

The Docker image uses micromamba with docker/environment-nextflow.yml for consistent bioinformatics tool installations across all environments.

๐Ÿงช Testing & Quality Assurance

Running Tests

# Run complete test suite
make test

# Specific test categories
uv run pytest tests/unit/           # Unit tests
uv run pytest tests/integration/    # Integration tests
uv run pytest tests/pipeline/       # Pipeline tests

# With coverage reporting
uv run pytest --cov=sirnaforge --cov-report=html

# Run tests in parallel
uv run pytest -n auto

# Ultra-fast CI/CD smoke tests (NEW!)
make docker-test-smoke

# Validate fast CI/CD setup
python scripts/validate_fast_ci.py

Fast CI/CD with Toy Data โšก

siRNAforge now includes an improved CI/CD workflow designed for quick feedback with minimal resources:

  • โšก Ultra-fast execution: < 15 minutes total
  • ๐Ÿชถ Minimal resources: 256MB memory, 0.5 CPU cores
  • ๐Ÿงธ Toy data: < 500 bytes of test sequences
  • ๐Ÿ”ฅ Smoke tests: Essential functionality validation
# Trigger fast CI/CD workflow locally
pytest -m "smoke" --tb=short

# Use toy data for quick validation
ls tests/unit/data/toy_*.fasta

# Fast workflow vs comprehensive workflow
# Fast:    15 min,  256MB RAM, toy data
# Full:    60 min,    8GB RAM, real datasets

See docs/ci-cd-fast.md for detailed documentation.

Test Categories

  • Unit Tests - Core algorithm validation
  • Integration Tests - Component interaction testing
  • Pipeline Tests - Nextflow workflow validation
  • Docker Tests - Container functionality testing

๐Ÿ“š Documentation

Local Documentation Building

# Install documentation dependencies
uv sync --group docs

# Build HTML documentation
make docs

# Generate CLI reference
make docs-cli

# Generate usage examples
make docs-examples

# Build complete documentation set
make docs-full

Generated Documentation

  • docs/_build/html/ - Complete Sphinx HTML documentation
  • docs/CLI_REFERENCE.md - Auto-generated CLI help
  • docs/examples/USAGE_EXAMPLES.md - Usage examples
  • docs/api_reference.rst - Python API documentation

๐Ÿ“– See docs/getting_started.md for detailed tutorials and docs/deployment.md for deployment guides.

๐Ÿค Contributing

We welcome contributions to siRNAforge! Here's how to get started:

Development Setup

  1. Fork the repository on GitHub
  2. Clone your fork: git clone https://github.com/yourusername/sirnaforge
  3. Setup development environment: make install-dev
  4. Create a feature branch: git checkout -b feature/amazing-feature

Development Workflow

# Make your changes
# ...

# Ensure code quality
make lint           # Check code style and types
make format         # Auto-format code
make test          # Run test suite

# Commit and push
git add .
git commit -m 'Add amazing feature'
git push origin feature/amazing-feature

Contribution Guidelines

  • Code Style: Follow Black formatting and Ruff linting rules
  • Type Hints: All new code must include type annotations
  • Tests: Add tests for new functionality
  • Documentation: Update docstrings and documentation
  • Commit Messages: Use conventional commit format

Pull Request Process

  1. Ensure all tests pass and code is properly formatted
  2. Update documentation for any API changes
  3. Add entries to CHANGELOG.md for user-facing changes
  4. Create a pull request with a clear description

See CONTRIBUTING.md for detailed guidelines.

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

siRNAforge builds upon excellent open-source tools and libraries:

  • ViennaRNA Package - RNA secondary structure prediction
  • BWA-MEM2 - Fast and accurate sequence alignment
  • Nextflow - Workflow management and containerization
  • BioPython - Python bioinformatics toolkit
  • Pydantic - Data validation and type safety
  • Modern Python Stack - uv, Typer, Rich for developer experience

Note: Much of the code in this repository was developed with assistance from AI agents, but all code has been reviewed, tested, and validated by human developers.


siRNAforge โ€” Comprehensive siRNA design toolkit for gene silencing
Professional siRNA design for the modern researcher

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sirnaforge-0.2.0.tar.gz (411.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sirnaforge-0.2.0-py3-none-any.whl (115.8 kB view details)

Uploaded Python 3

File details

Details for the file sirnaforge-0.2.0.tar.gz.

File metadata

  • Download URL: sirnaforge-0.2.0.tar.gz
  • Upload date:
  • Size: 411.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sirnaforge-0.2.0.tar.gz
Algorithm Hash digest
SHA256 6998940204cc7b20677fdc37645c453db35be8430272b536f4846e1d619e1dde
MD5 726f65c8abc17c4fd672662b4f92ca32
BLAKE2b-256 909ac4311ecde8cf175c5711325476c315460b41691b7a78db81a3472d753a66

See more details on using hashes here.

Provenance

The following attestation bundles were made for sirnaforge-0.2.0.tar.gz:

Publisher: release.yml on Austin-s-h/sirnaforge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sirnaforge-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: sirnaforge-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 115.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sirnaforge-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7e4e9b0b68252ec5563650ca876b91ff4c9453d1c0beb400a3582f6d083df4b9
MD5 8858561f5cc567efbe681b00cb4711c5
BLAKE2b-256 a3850531ca64de866cf67fd2785ce5246fb52886e2d90b7aac78d2ad7269e228

See more details on using hashes here.

Provenance

The following attestation bundles were made for sirnaforge-0.2.0-py3-none-any.whl:

Publisher: release.yml on Austin-s-h/sirnaforge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page