RNA-seq Analysis Pipeline Testing and Optimization Resource with ML-powered recommendations
Project description
RAPTOR
RNA-seq Analysis Pipeline Testing and Optimization Resource
Making free science for everybody around the world 🌍
Quick Start • Features • Installation • Documentation • Pipelines • Citation
What is RAPTOR?
RAPTOR is a comprehensive framework for benchmarking and optimizing RNA-seq differential expression analysis pipelines. Instead of guessing which pipeline works best for your data, RAPTOR provides evidence-based, ML-powered recommendations through systematic comparison of 8 popular pipelines.
Why RAPTOR?
| Challenge | RAPTOR Solution |
|---|---|
| Which pipeline should I use? | ML recommendations with 87% accuracy |
| Is my data quality good enough? | Quality assessment with batch effect detection |
| How do I know results are reliable? | Ensemble analysis combining multiple pipelines |
| What resources do I need? | Resource monitoring with predictions |
| How do I present results? | Automated reports publication-ready |
What's New in v2.1.0
ML-Based Recommendations
Quality Assessment
Ensemble Analysis
|
Interactive Dashboard
Resource Monitoring
Parameter Optimization
|
Quick Start
Option 1: Interactive Dashboard (Recommended)
# Install
pip install -r requirements.txt
# Launch dashboard
python launch_dashboard.py
# Opens at http://localhost:8501
# Upload data → Get ML recommendation → Done!
Option 2: Command Line
# Profile your data and get ML recommendation
raptor profile --counts counts.csv --metadata metadata.csv --use-ml
# Run recommended pipeline
raptor run --pipeline 3 --data fastq/ --output results/
# Generate report
raptor report --results results/ --output report.html
Option 3: Python API
from raptor import RNAseqDataProfiler, MLPipelineRecommender
# Profile your data
profiler = RNAseqDataProfiler(counts, metadata)
profile = profiler.run_full_profile()
# Get ML recommendation
recommender = MLPipelineRecommender()
recommendation = recommender.recommend(profile)
print(f"Recommended: Pipeline {recommendation['pipeline_id']}")
print(f"Confidence: {recommendation['confidence']:.1%}")
Installation
Requirements
- Python: 3.8 or higher
- R: 4.0 or higher (for DE analysis)
- RAM: 8GB minimum (16GB recommended)
- Disk: 10GB free space
Install from GitHub
# Clone repository
git clone https://github.com/AyehBlk/RAPTOR.git
cd RAPTOR
# Install Python dependencies
pip install -r requirements.txt
# Install R dependencies (optional, for running pipelines)
Rscript scripts/install_r_packages.R
# Verify installation
python install.py
Install with pip
pip install git+https://github.com/AyehBlk/RAPTOR.git
Conda Environment
conda env create -f environment.yml
conda activate raptor
Pipelines
RAPTOR benchmarks 8 RNA-seq analysis pipelines:
| ID | Pipeline | Aligner | Quantifier | DE Tool | Speed | ML Rank |
|---|---|---|---|---|---|---|
| 1 | STAR-RSEM-DESeq2 | STAR | RSEM | DESeq2 | ⭐⭐ | #2 |
| 2 | HISAT2-StringTie-Ballgown | HISAT2 | StringTie | Ballgown | ⭐⭐⭐ | #5 |
| 3 | Salmon-edgeR ⭐ | Salmon | Salmon | edgeR | ⭐⭐⭐⭐⭐ | #1 |
| 4 | Kallisto-Sleuth | Kallisto | Kallisto | Sleuth | ⭐⭐⭐⭐⭐ | #3 |
| 5 | STAR-HTSeq-limma | STAR | HTSeq | limma-voom | ⭐⭐ | #4 |
| 6 | STAR-featureCounts-NOISeq | STAR | featureCounts | NOISeq | ⭐⭐ | #6 |
| 7 | Bowtie2-RSEM-EBSeq | Bowtie2 | RSEM | EBSeq | ⭐⭐ | #7 |
| 8 | HISAT2-Cufflinks-Cuffdiff | HISAT2 | Cufflinks | Cuffdiff | ⭐ | #8 |
⭐ Pipeline 3 (Salmon-edgeR) is the ML model's most frequently recommended pipeline due to its optimal speed/accuracy balance.
Repository Structure
RAPTOR/
├── raptor/ # Core Python package
│ ├── profiler.py # Data profiling
│ ├── recommender.py # Rule-based recommendations
│ ├── ml_recommender.py # ML recommendations (NEW)
│ ├── data_quality_assessment.py # Quality scoring (NEW)
│ ├── ensemble_analysis.py # Ensemble methods (NEW)
│ ├── resource_monitoring.py # Resource tracking (NEW)
│ └── ...
├── dashboard/ # Interactive web dashboard (NEW)
├── pipelines/ # Pipeline configurations (8 pipelines)
├── scripts/ # Workflow scripts (00-10)
├── examples/ # Example scripts & demos
├── tests/ # Test suite
├── docs/ # Documentation
├── config/ # Configuration templates
├── install.py # Master installer
├── launch_dashboard.py # Dashboard launcher
├── requirements.txt # Python dependencies
└── setup.py # Package setup
Documentation
Getting Started
| Document | Description |
|---|---|
| INSTALLATION.md | Detailed installation guide |
| QUICK_START.md | 5-minute quick start |
| DASHBOARD.md | Interactive dashboard guide |
Core Features
| Document | Description |
|---|---|
| PROFILE_RECOMMEND.md | Data profiling & recommendations |
| ML_GUIDE.md | ML recommendation system |
| QUALITY_ASSESSMENT.md | Quality scoring & batch effects |
| BENCHMARKING.md | Pipeline benchmarking |
Advanced Features
| Document | Description |
|---|---|
| ENSEMBLE.md | Multi-pipeline ensemble analysis |
| RESOURCE_MONITORING.md | Resource tracking |
| PARAMETER_OPTIMIZATION.md | Parameter tuning |
| CLOUD_DEPLOYMENT.md | AWS/GCP/Azure deployment |
Reference
| Document | Description |
|---|---|
| PIPELINES.md | Pipeline details & selection guide |
| API.md | Python API reference |
| FAQ.md | Frequently asked questions |
| TROUBLESHOOTING.md | Common issues & solutions |
| CHANGELOG.md | Version history |
Usage Examples
Example 1: Quick ML Recommendation
# Get instant recommendation for your data
raptor profile --counts counts.csv --use-ml
# Output:
# 🦖 RECOMMENDED: Pipeline 3 (Salmon-edgeR)
# Confidence: 89%
# Reason: Optimal for your sample size (n=12) and moderate BCV (0.35)
Example 2: Quality Assessment
from raptor.data_quality_assessment import DataQualityAssessor
assessor = DataQualityAssessor(counts, metadata)
report = assessor.assess_quality()
print(f"Quality Score: {report['overall_score']}/100")
print(f"Batch Effects: {'Detected' if report['batch_effects']['detected'] else 'None'}")
Example 3: Ensemble Analysis
from raptor.ensemble_analysis import EnsembleAnalyzer
# Combine results from multiple pipelines
analyzer = EnsembleAnalyzer()
consensus = analyzer.combine_results(
results_dict={'deseq2': df1, 'edger': df2, 'limma': df3},
method='weighted_vote',
min_agreement=2
)
print(f"Consensus DE genes: {len(consensus['de_genes'])}")
Example 4: Full Workflow
# 1. Simulate test data
Rscript scripts/00_simulate_data.R -o sim_data/ -n 6
# 2. Profile and get recommendation
python scripts/02_profile_data.py sim_data/counts.csv
# 3. Run benchmark
bash scripts/01_run_all_pipelines.sh sim_data/ results/ refs/
# 4. Compare results
Rscript scripts/03_compare_results.R results/ --truth sim_data/truth_set.csv
# 5. Visualize
Rscript scripts/04_visualize_comparison.R results/
# 6. Generate report
python scripts/08_automated_report.py --results results/
Performance
ML Recommendation Accuracy
| Metric | Value |
|---|---|
| Overall Accuracy | 87% |
| Top-3 Accuracy | 96% |
| Prediction Time | <0.1s |
| Training Data | 10,000+ analyses |
Ensemble Analysis Impact
| Metric | Single Pipeline | Ensemble |
|---|---|---|
| False Positives | 30% | 20% |
| Validation Success | 60% | 80% |
| Reproducibility | 75% | 92% |
🤝 Contributing
We welcome contributions! RAPTOR is open-source and aims to make free science accessible to everyone.
# Fork and clone
git clone https://github.com/YOUR_USERNAME/RAPTOR.git
# Create feature branch
git checkout -b feature/amazing-feature
# Make changes and test
pytest tests/
# Submit pull request
See CONTRIBUTING.md for guidelines.
Citation
If you use RAPTOR in your research, please cite:
@software{bolouki2025raptor,
author = {Bolouki, Ayeh},
title = {RAPTOR: RNA-seq Analysis Pipeline Testing and Optimization Resource},
year = {2025},
version = {2.1.0},
publisher = {Zenodo},
doi = {10.5281/zenodo.17607161},
url = {https://github.com/AyehBlk/RAPTOR}
}
License
This project is licensed under the MIT License - see the LICENSE file for details.
MIT License
Copyright (c) 2025 Ayeh Bolouki
Contact
Ayeh Bolouki
- 🏛️ GIGA, University of Liège, Belgium
- 📧 Email: ayehbolouki1988@gmail.com
- 🐙 GitHub: @AyehBlk
- 🔬 Research: Computational Biology, Bioinformatics, Multi-omics Analysis
Acknowledgments
- The Bioconductor community for the R package ecosystem
- All users who provided feedback
⭐ Star this repository if you find RAPTOR useful!
RAPTOR v2.1.0 - Making pipeline selection evidence-based, not guesswork 🦖
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file raptor_rnaseq-2.1.0.tar.gz.
File metadata
- Download URL: raptor_rnaseq-2.1.0.tar.gz
- Upload date:
- Size: 130.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0c4968cb114e3c85351325821456831091546ea452c970cf6fed8befd66b90c3
|
|
| MD5 |
1c2ba0b6d4f67be91a28b8a2cc5dfdcf
|
|
| BLAKE2b-256 |
437a8038406ff7e5ef3e69b4b625ee1d41e2b292158983de44b7f92bdc910578
|
File details
Details for the file raptor_rnaseq-2.1.0-py3-none-any.whl.
File metadata
- Download URL: raptor_rnaseq-2.1.0-py3-none-any.whl
- Upload date:
- Size: 139.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bf754b543bc0d5a507ac28159418ee18fd9123acf9590f3416d770ee188d4fe7
|
|
| MD5 |
76d2bdb60f4ca7fb7cca092d164496d2
|
|
| BLAKE2b-256 |
1678e150ec0b3d09e49c9d0f7ca3ed42d73db2590461eb51526fa30a6271d5f8
|