RNA-seq Analysis Pipeline Testing and Optimization Resource with ML-powered recommendations

These details have not been verified by PyPI

Project links

Project description

RAPTOR v2.1.0

RAPTOR

RNA-seq Analysis Pipeline Testing and Optimization Resource

Making free science for everybody around the world 🌍

Quick Start • Features • Installation • Documentation • Pipelines • Citation

What is RAPTOR?

RAPTOR is a comprehensive framework for benchmarking and optimizing RNA-seq differential expression analysis pipelines. Instead of guessing which pipeline works best for your data, RAPTOR provides evidence-based, ML-powered recommendations through systematic comparison of 8 popular pipelines.

Why RAPTOR?

Challenge	RAPTOR Solution
Which pipeline should I use?	ML recommendations with 87% accuracy
Is my data quality good enough?	Quality assessment with batch effect detection
How do I know results are reliable?	Ensemble analysis combining multiple pipelines
What resources do I need?	Resource monitoring with predictions
How do I present results?	Automated reports publication-ready

What's New in v2.1.0

ML-Based Recommendations

87% prediction accuracy
Confidence scoring (0-100%)
Learns from 10,000+ analyses
Explains its reasoning

Quality Assessment

6-component quality scoring
Batch effect detection
Outlier identification
Actionable recommendations

Ensemble Analysis

5 combination methods
33% fewer false positives
High-confidence gene lists
Consensus validation

Interactive Dashboard

Web-based interface (no coding!)
Real-time visualizations
Drag-and-drop data upload
One-click reports

Resource Monitoring

Real-time CPU/memory tracking
<1% performance overhead
Resource predictions
Cost estimation for cloud

Parameter Optimization

Bayesian optimization
Grid search
Adaptive tuning
Best parameter selection

Quick Start

Option 1: Interactive Dashboard (Recommended)

# Install
pip install -r requirements.txt

# Launch dashboard
python launch_dashboard.py

# Opens at http://localhost:8501
# Upload data → Get ML recommendation → Done!

Option 2: Command Line

# Profile your data and get ML recommendation
raptor profile --counts counts.csv --metadata metadata.csv --use-ml

# Run recommended pipeline
raptor run --pipeline 3 --data fastq/ --output results/

# Generate report
raptor report --results results/ --output report.html

Option 3: Python API

from raptor import RNAseqDataProfiler, MLPipelineRecommender

# Profile your data
profiler = RNAseqDataProfiler(counts, metadata)
profile = profiler.run_full_profile()

# Get ML recommendation
recommender = MLPipelineRecommender()
recommendation = recommender.recommend(profile)

print(f"Recommended: Pipeline {recommendation['pipeline_id']}")
print(f"Confidence: {recommendation['confidence']:.1%}")

Installation

Requirements

Python: 3.8 or higher
R: 4.0 or higher (for DE analysis)
RAM: 8GB minimum (16GB recommended)
Disk: 10GB free space

Install from GitHub

# Clone repository
git clone https://github.com/AyehBlk/RAPTOR.git
cd RAPTOR

# Install Python dependencies
pip install -r requirements.txt

# Install R dependencies (optional, for running pipelines)
Rscript scripts/install_r_packages.R

# Verify installation
python install.py

Install with pip

pip install git+https://github.com/AyehBlk/RAPTOR.git

Conda Environment

conda env create -f environment.yml
conda activate raptor

Pipelines

RAPTOR benchmarks 8 RNA-seq analysis pipelines:

ID	Pipeline	Aligner	Quantifier	DE Tool	Speed	ML Rank
1	STAR-RSEM-DESeq2	STAR	RSEM	DESeq2	⭐⭐	#2
2	HISAT2-StringTie-Ballgown	HISAT2	StringTie	Ballgown	⭐⭐⭐	#5
3	Salmon-edgeR ⭐	Salmon	Salmon	edgeR	⭐⭐⭐⭐⭐	#1
4	Kallisto-Sleuth	Kallisto	Kallisto	Sleuth	⭐⭐⭐⭐⭐	#3
5	STAR-HTSeq-limma	STAR	HTSeq	limma-voom	⭐⭐	#4
6	STAR-featureCounts-NOISeq	STAR	featureCounts	NOISeq	⭐⭐	#6
7	Bowtie2-RSEM-EBSeq	Bowtie2	RSEM	EBSeq	⭐⭐	#7
8	HISAT2-Cufflinks-Cuffdiff	HISAT2	Cufflinks	Cuffdiff	⭐	#8

⭐ Pipeline 3 (Salmon-edgeR) is the ML model's most frequently recommended pipeline due to its optimal speed/accuracy balance.

Repository Structure

RAPTOR/
├── raptor/                 # Core Python package
│   ├── profiler.py         # Data profiling
│   ├── recommender.py      # Rule-based recommendations
│   ├── ml_recommender.py   # ML recommendations (NEW)
│   ├── data_quality_assessment.py  # Quality scoring (NEW)
│   ├── ensemble_analysis.py        # Ensemble methods (NEW)
│   ├── resource_monitoring.py      # Resource tracking (NEW)
│   └── ...
├── dashboard/              # Interactive web dashboard (NEW)
├── pipelines/              # Pipeline configurations (8 pipelines)
├── scripts/                # Workflow scripts (00-10)
├── examples/               # Example scripts & demos
├── tests/                  # Test suite
├── docs/                   # Documentation
├── config/                 # Configuration templates
├── install.py              # Master installer
├── launch_dashboard.py     # Dashboard launcher
├── requirements.txt        # Python dependencies
└── setup.py                # Package setup

Documentation

Getting Started

Document	Description
INSTALLATION.md	Detailed installation guide
QUICK_START.md	5-minute quick start
DASHBOARD.md	Interactive dashboard guide

Core Features

Document	Description
PROFILE_RECOMMEND.md	Data profiling & recommendations
ML_GUIDE.md	ML recommendation system
QUALITY_ASSESSMENT.md	Quality scoring & batch effects
BENCHMARKING.md	Pipeline benchmarking

Advanced Features

Document	Description
ENSEMBLE.md	Multi-pipeline ensemble analysis
RESOURCE_MONITORING.md	Resource tracking
PARAMETER_OPTIMIZATION.md	Parameter tuning
CLOUD_DEPLOYMENT.md	AWS/GCP/Azure deployment

Reference

Document	Description
PIPELINES.md	Pipeline details & selection guide
API.md	Python API reference
FAQ.md	Frequently asked questions
TROUBLESHOOTING.md	Common issues & solutions
CHANGELOG.md	Version history

Usage Examples

Example 1: Quick ML Recommendation

# Get instant recommendation for your data
raptor profile --counts counts.csv --use-ml

# Output:
# 🦖 RECOMMENDED: Pipeline 3 (Salmon-edgeR)
# Confidence: 89%
# Reason: Optimal for your sample size (n=12) and moderate BCV (0.35)

Example 2: Quality Assessment

from raptor.data_quality_assessment import DataQualityAssessor

assessor = DataQualityAssessor(counts, metadata)
report = assessor.assess_quality()

print(f"Quality Score: {report['overall_score']}/100")
print(f"Batch Effects: {'Detected' if report['batch_effects']['detected'] else 'None'}")

Example 3: Ensemble Analysis

from raptor.ensemble_analysis import EnsembleAnalyzer

# Combine results from multiple pipelines
analyzer = EnsembleAnalyzer()
consensus = analyzer.combine_results(
    results_dict={'deseq2': df1, 'edger': df2, 'limma': df3},
    method='weighted_vote',
    min_agreement=2
)

print(f"Consensus DE genes: {len(consensus['de_genes'])}")

Example 4: Full Workflow

# 1. Simulate test data
Rscript scripts/00_simulate_data.R -o sim_data/ -n 6

# 2. Profile and get recommendation
python scripts/02_profile_data.py sim_data/counts.csv

# 3. Run benchmark
bash scripts/01_run_all_pipelines.sh sim_data/ results/ refs/

# 4. Compare results
Rscript scripts/03_compare_results.R results/ --truth sim_data/truth_set.csv

# 5. Visualize
Rscript scripts/04_visualize_comparison.R results/

# 6. Generate report
python scripts/08_automated_report.py --results results/

Performance

ML Recommendation Accuracy

Metric	Value
Overall Accuracy	87%
Top-3 Accuracy	96%
Prediction Time	<0.1s
Training Data	10,000+ analyses

Ensemble Analysis Impact

Metric	Single Pipeline	Ensemble
False Positives	30%	20%
Validation Success	60%	80%
Reproducibility	75%	92%

🤝 Contributing

We welcome contributions! RAPTOR is open-source and aims to make free science accessible to everyone.

# Fork and clone
git clone https://github.com/YOUR_USERNAME/RAPTOR.git

# Create feature branch
git checkout -b feature/amazing-feature

# Make changes and test
pytest tests/

# Submit pull request

See CONTRIBUTING.md for guidelines.

Citation

If you use RAPTOR in your research, please cite:

@software{bolouki2025raptor,
  author       = {Bolouki, Ayeh},
  title        = {RAPTOR: RNA-seq Analysis Pipeline Testing and Optimization Resource},
  year         = {2025},
  version      = {2.1.0},
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.17607161},
  url          = {https://github.com/AyehBlk/RAPTOR}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

MIT License
Copyright (c) 2025 Ayeh Bolouki

Contact

Ayeh Bolouki

🏛️ GIGA, University of Liège, Belgium
📧 Email: ayehbolouki1988@gmail.com
🐙 GitHub: @AyehBlk
🔬 Research: Computational Biology, Bioinformatics, Multi-omics Analysis

Acknowledgments

The Bioconductor community for the R package ecosystem
All users who provided feedback

⭐ Star this repository if you find RAPTOR useful!

GitHub Stars

RAPTOR v2.1.0 - Making pipeline selection evidence-based, not guesswork 🦖

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.2.1

Mar 18, 2026

2.2.0

Mar 11, 2026

2.1.2

Dec 30, 2025

2.1.1

Dec 17, 2025

This version

2.1.0

Dec 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

raptor_rnaseq-2.1.0.tar.gz (130.0 kB view details)

Uploaded Dec 12, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

raptor_rnaseq-2.1.0-py3-none-any.whl (139.2 kB view details)

Uploaded Dec 12, 2025 Python 3

File details

Details for the file raptor_rnaseq-2.1.0.tar.gz.

File metadata

Download URL: raptor_rnaseq-2.1.0.tar.gz
Upload date: Dec 12, 2025
Size: 130.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for raptor_rnaseq-2.1.0.tar.gz
Algorithm	Hash digest
SHA256	`0c4968cb114e3c85351325821456831091546ea452c970cf6fed8befd66b90c3`
MD5	`1c2ba0b6d4f67be91a28b8a2cc5dfdcf`
BLAKE2b-256	`437a8038406ff7e5ef3e69b4b625ee1d41e2b292158983de44b7f92bdc910578`

See more details on using hashes here.

File details

Details for the file raptor_rnaseq-2.1.0-py3-none-any.whl.

File metadata

Download URL: raptor_rnaseq-2.1.0-py3-none-any.whl
Upload date: Dec 12, 2025
Size: 139.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for raptor_rnaseq-2.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bf754b543bc0d5a507ac28159418ee18fd9123acf9590f3416d770ee188d4fe7`
MD5	`76d2bdb60f4ca7fb7cca092d164496d2`
BLAKE2b-256	`1678e150ec0b3d09e49c9d0f7ca3ed42d73db2590461eb51526fa30a6271d5f8`

See more details on using hashes here.

raptor-rnaseq 2.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

RAPTOR

RNA-seq Analysis Pipeline Testing and Optimization Resource

What is RAPTOR?

Why RAPTOR?

What's New in v2.1.0

ML-Based Recommendations

Quality Assessment

Ensemble Analysis

Interactive Dashboard

Resource Monitoring

Parameter Optimization

Quick Start

Option 1: Interactive Dashboard (Recommended)

Option 2: Command Line

Option 3: Python API

Installation

Requirements

Install from GitHub

Install with pip

Conda Environment

Pipelines

Repository Structure

Documentation

Getting Started

Core Features

Advanced Features

Reference

Usage Examples

Example 1: Quick ML Recommendation

Example 2: Quality Assessment

Example 3: Ensemble Analysis

Example 4: Full Workflow

Performance

ML Recommendation Accuracy

Ensemble Analysis Impact

🤝 Contributing

Citation

License

Contact

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes