RNA-seq Analysis Pipeline Testing and Optimization Resource with ML-powered recommendations and adaptive threshold optimization

These details have not been verified by PyPI

Project links

Project description

RAPTOR v2.1.1

RAPTOR

RNA-seq Analysis Pipeline Testing and Optimization Resource

Making free science for everybody around the world 🌍

Quick Start • Features • Installation • Documentation • Pipelines • Citation

🆕 What's New in v2.1.1

Adaptive Threshold Optimizer (ATO)

Stop using arbitrary thresholds! The new Adaptive Threshold Optimizer determines data-driven significance cutoffs for differential expression analysis.

from raptor.threshold_optimizer import optimize_thresholds
import pandas as pd

df = pd.read_csv('deseq2_results.csv')
result = optimize_thresholds(df, goal='discovery')

print(f"Optimal logFC: {result.logfc_threshold:.2f}")
print(f"Significant genes: {result.n_significant}")
print(f"\n{result.methods_text}")  # Publication-ready!

Key Features:

Multiple p-value adjustment methods (BH, BY, Storey q-value, Holm, Bonferroni)
Five logFC optimization methods (MAD, mixture model, power-based, percentile, consensus)
π₀ estimation for true null proportion
Three analysis goals: discovery, balanced, validation
Auto-generated publication methods text
Interactive dashboard integration

What is RAPTOR?

RAPTOR is a comprehensive framework for benchmarking and optimizing RNA-seq differential expression analysis pipelines. Instead of guessing which pipeline works best for your data, RAPTOR provides evidence-based, ML-powered recommendations through systematic comparison of 8 popular pipelines.

Why RAPTOR?

Challenge	RAPTOR Solution
Which pipeline should I use?	✅ ML recommendations with 87% accuracy
What thresholds should I use?	✅ Adaptive Threshold Optimizer (NEW!)
Is my data quality good enough?	✅ Quality assessment with batch effect detection
How do I know results are reliable?	✅ Ensemble analysis combining multiple pipelines
What resources do I need?	✅ Resource monitoring with predictions
How do I present results?	✅ Automated reports publication-ready

Features

Adaptive Threshold Optimizer (NEW!)

Data-driven logFC and p-value thresholds
Multiple statistical methods
Publication-ready methods text
Interactive dashboard page

ML-Based Recommendations

87% prediction accuracy
Confidence scoring (0-100%)
Learns from 10,000+ analyses
Explains its reasoning

Quality Assessment

6-component quality scoring
Batch effect detection
Outlier identification
Actionable recommendations

Ensemble Analysis

5 combination methods
33% fewer false positives
High-confidence gene lists
Consensus validation

Interactive Dashboard

Web-based interface (no coding!)
Real-time visualizations
Drag-and-drop data upload
One-click reports

Resource Monitoring

Real-time CPU/memory tracking
<1% performance overhead
Resource predictions
Cost estimation for cloud

Quick Start

Option 1: Interactive Dashboard (Recommended)

# Install
pip install raptor-rnaseq

# Launch dashboard
raptor dashboard

# Opens at http://localhost:8501
# Upload data → Get ML recommendation → Use 🎯 Threshold Optimizer → Done!

Option 2: Command Line

# Profile your data and get ML recommendation
raptor profile --counts counts.csv --metadata metadata.csv --use-ml

# Run recommended pipeline
raptor run --pipeline 3 --data fastq/ --output results/

# Optimize thresholds (NEW!)
raptor optimize-thresholds --input results.csv --goal balanced

# Generate report
raptor report --results results/ --output report.html

Option 3: Python API

from raptor import RNAseqDataProfiler, MLPipelineRecommender
from raptor.threshold_optimizer import optimize_thresholds

# Profile your data
profiler = RNAseqDataProfiler(counts, metadata)
profile = profiler.run_full_profile()

# Get ML recommendation
recommender = MLPipelineRecommender()
recommendation = recommender.recommend(profile)

print(f"Recommended: Pipeline {recommendation['pipeline_id']}")
print(f"Confidence: {recommendation['confidence']:.1%}")

# After running pipeline, optimize thresholds (NEW!)
de_results = pd.read_csv('de_results.csv')
result = optimize_thresholds(de_results, goal='balanced')
print(f"Optimal |logFC|: {result.logfc_threshold:.2f}")
print(result.methods_text)

Installation

Requirements

Python: 3.8 or higher
R: 4.0 or higher (for DE analysis)
RAM: 8GB minimum (16GB recommended)
Disk: 10GB free space

Install from PyPI (Recommended)

pip install raptor-rnaseq

With optional dependencies:

# With dashboard support
pip install raptor-rnaseq[dashboard]

# With all features
pip install raptor-rnaseq[all]

Install from GitHub

# Clone repository
git clone https://github.com/AyehBlk/RAPTOR.git
cd RAPTOR

# Install Python dependencies
pip install -r requirements.txt

# Verify installation
python install.py

Conda Environment

conda env create -f environment.yml
conda activate raptor

Pipelines

RAPTOR benchmarks 8 RNA-seq analysis pipelines:

ID	Pipeline	Aligner	Quantifier	DE Tool	Speed	ML Rank
1	STAR-RSEM-DESeq2	STAR	RSEM	DESeq2	⭐⭐	#2
2	HISAT2-StringTie-Ballgown	HISAT2	StringTie	Ballgown	⭐⭐⭐	#5
3	Salmon-edgeR ⭐	Salmon	Salmon	edgeR	⭐⭐⭐⭐⭐	#1
4	Kallisto-Sleuth	Kallisto	Kallisto	Sleuth	⭐⭐⭐⭐⭐	#3
5	STAR-HTSeq-limma	STAR	HTSeq	limma-voom	⭐⭐	#4
6	STAR-featureCounts-NOISeq	STAR	featureCounts	NOISeq	⭐⭐	#6
7	Bowtie2-RSEM-EBSeq	Bowtie2	RSEM	EBSeq	⭐⭐	#7
8	HISAT2-Cufflinks-Cuffdiff	HISAT2	Cufflinks	Cuffdiff	⭐	#8

⭐ Pipeline 3 (Salmon-edgeR) is the ML model's most frequently recommended pipeline due to its optimal speed/accuracy balance.

Repository Structure

RAPTOR/
├── raptor/                 # Core Python package
│   ├── profiler.py         # Data profiling
│   ├── recommender.py      # Rule-based recommendations
│   ├── ml_recommender.py   # ML recommendations
│   ├── threshold_optimizer/ # 🆕 Adaptive Threshold Optimizer (v2.1.1)
│   │   ├── __init__.py
│   │   ├── ato.py          # Core ATO class
│   │   └── visualization.py # ATO visualizations
│   ├── data_quality_assessment.py
│   ├── ensemble_analysis.py
│   ├── resource_monitoring.py
│   └── ...
├── dashboard/              # Interactive web dashboard
├── pipelines/              # Pipeline configurations (8 pipelines)
├── scripts/                # Workflow scripts (00-10)
├── examples/               # Example scripts & demos
├── tests/                  # Test suite
├── docs/                   # Documentation
├── config/                 # Configuration templates
├── install.py              # Master installer
├── launch_dashboard.py     # Dashboard launcher
├── requirements.txt        # Python dependencies
└── setup.py                # Package setup

Documentation

Getting Started

Document	Description
INSTALLATION.md	Detailed installation guide
QUICK_START.md	5-minute quick start
DASHBOARD.md	Interactive dashboard guide

Core Features

Document	Description
THRESHOLD_OPTIMIZER.md	🆕 Adaptive threshold optimization
PROFILE_RECOMMEND.md	Data profiling & recommendations
QUALITY_ASSESSMENT.md	Quality scoring & batch effects
BENCHMARKING.md	Pipeline benchmarking

Advanced Features

Document	Description
ENSEMBLE.md	Multi-pipeline ensemble analysis
RESOURCE_MONITORING.md	Resource tracking
CLOUD_DEPLOYMENT.md	AWS/GCP/Azure deployment

Reference

Document	Description
PIPELINES.md	Pipeline details & selection guide
API.md	Python API reference
FAQ.md	Frequently asked questions
CHANGELOG.md	Version history

Usage Examples

Example 1: Quick Threshold Optimization (NEW!)

from raptor.threshold_optimizer import optimize_thresholds
import pandas as pd

# Load DE results
df = pd.read_csv('deseq2_results.csv')

# Optimize thresholds
result = optimize_thresholds(df, goal='balanced')

print(f"Optimal |logFC|: {result.logfc_threshold:.3f}")
print(f"Significant genes: {result.n_significant}")
print(f"π₀ estimate: {result.pi0:.3f}")

# Get publication methods text
print(result.methods_text)

# Save results
result.results_df.to_csv('optimized_results.csv')

Example 2: Full Workflow

from raptor import RNAseqDataProfiler, MLPipelineRecommender
from raptor.threshold_optimizer import optimize_thresholds
import pandas as pd

# 1. Profile data
counts = pd.read_csv('counts.csv', index_col=0)
metadata = pd.read_csv('metadata.csv')

profiler = RNAseqDataProfiler(counts, metadata, use_ml=True)
profile = profiler.profile(quality_check=True)
print(f"Quality Score: {profile['quality_score']}/100")

# 2. Get ML recommendation
recommender = MLPipelineRecommender()
recommendations = recommender.recommend(profile, n=3)
print(f"Recommended: {recommendations[0]['pipeline_name']}")

# 3. [Run recommended pipeline - produces DE results]
# raptor run --pipeline 3 ...

# 4. Optimize thresholds (NEW in v2.1.1)
de_results = pd.read_csv('deseq2_results.csv')
result = optimize_thresholds(
    de_results,
    logfc_col='log2FoldChange',
    pvalue_col='pvalue',
    goal='balanced'
)

print(f"\n🎯 Optimized Thresholds:")
print(f"   LogFC: |{result.logfc_threshold:.3f}|")
print(f"   Significant: {result.n_significant} genes")

# 5. Save results with methods text
result.results_df.to_csv('final_results.csv')
with open('methods.txt', 'w') as f:
    f.write(result.methods_text)

Example 3: Ensemble Analysis with ATO

from raptor.ensemble_analysis import EnsembleAnalyzer
from raptor.threshold_optimizer import optimize_thresholds

# Combine results from multiple pipelines
analyzer = EnsembleAnalyzer()
consensus = analyzer.combine_results(
    results_dict={'deseq2': df1, 'edger': df2, 'limma': df3},
    method='weighted_vote',
    min_agreement=2
)

# Use ATO for uniform thresholds across ensemble
result = optimize_thresholds(consensus['combined'], goal='balanced')
print(f"Consensus DE genes: {result.n_significant}")

Performance

ML Recommendation Accuracy

Metric	Value
Overall Accuracy	87%
Top-3 Accuracy	96%
Prediction Time	<0.1s
Training Data	10,000+ analyses

Threshold Optimizer Benefits

Metric	Traditional	With ATO
Threshold justification	Arbitrary	Data-driven
Methods text	Manual	Auto-generated
False positives	Higher	Optimized
Reproducibility	Variable	Standardized

Contributing

We welcome contributions! RAPTOR is open-source and aims to make free science accessible to everyone.

# Fork and clone
git clone https://github.com/YOUR_USERNAME/RAPTOR.git

# Create feature branch
git checkout -b feature/amazing-feature

# Make changes and test
pytest tests/

# Submit pull request

See CONTRIBUTING.md for guidelines.

Citation

If you use RAPTOR in your research, please cite:

@software{bolouki2025raptor,
  author       = {Bolouki, Ayeh},
  title        = {RAPTOR: RNA-seq Analysis Pipeline Testing and Optimization Resource},
  year         = {2025},
  version      = {2.1.1},
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.17607161},
  url          = {https://github.com/AyehBlk/RAPTOR}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

MIT License
Copyright (c) 2025 Ayeh Bolouki

Contact

Ayeh Bolouki

🏛️ GIGA, University of Liège, Belgium
📧 Email: ayehbolouki1988@gmail.com
🐙 GitHub: @AyehBlk
🔬 Research: Computational Biology, Bioinformatics, Multi-omics Analysis

Acknowledgments

The Bioconductor community for the R package ecosystem
All users who provided feedback

⭐ Star this repository if you find RAPTOR useful!

GitHub Stars

RAPTOR v2.1.1 - Making pipeline selection evidence-based, not guesswork 🦖

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.2.1

Mar 18, 2026

2.2.0

Mar 11, 2026

This version

2.1.2

Dec 30, 2025

2.1.1

Dec 17, 2025

2.1.0

Dec 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

raptor_rnaseq-2.1.2.tar.gz (157.0 kB view details)

Uploaded Dec 30, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

raptor_rnaseq-2.1.2-py3-none-any.whl (164.3 kB view details)

Uploaded Dec 30, 2025 Python 3

File details

Details for the file raptor_rnaseq-2.1.2.tar.gz.

File metadata

Download URL: raptor_rnaseq-2.1.2.tar.gz
Upload date: Dec 30, 2025
Size: 157.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for raptor_rnaseq-2.1.2.tar.gz
Algorithm	Hash digest
SHA256	`ed5a9426c1d6a69ebf72707cdb76ac059c6c63a846c445d80456bc7098135a37`
MD5	`35ad4c1db161b1f5e4a2541b86f0f957`
BLAKE2b-256	`ef6fcb12d97b02a051ab2a0606b6f48e79a0254b94f32bc17cec7ce4c662e3ff`

See more details on using hashes here.

File details

Details for the file raptor_rnaseq-2.1.2-py3-none-any.whl.

File metadata

Download URL: raptor_rnaseq-2.1.2-py3-none-any.whl
Upload date: Dec 30, 2025
Size: 164.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for raptor_rnaseq-2.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`592270f4f4ab3a119a086081df35fcbc7fa006441656d18e99e52e19d1b07b46`
MD5	`fc065e435db47c1b8051be62051f338b`
BLAKE2b-256	`703ea22729624004b783ca23c50f0ff9d158d9d38ca67c8c8bb245dbc07755dd`

See more details on using hashes here.

raptor-rnaseq 2.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

RAPTOR

RNA-seq Analysis Pipeline Testing and Optimization Resource

🆕 What's New in v2.1.1

Adaptive Threshold Optimizer (ATO)

What is RAPTOR?

Why RAPTOR?

Features

Adaptive Threshold Optimizer (NEW!)

ML-Based Recommendations

Quality Assessment

Ensemble Analysis

Interactive Dashboard

Resource Monitoring

Quick Start

Option 1: Interactive Dashboard (Recommended)

Option 2: Command Line

Option 3: Python API

Installation

Requirements

Install from PyPI (Recommended)

Install from GitHub

Conda Environment

Pipelines

Repository Structure

Documentation

Getting Started

Core Features

Advanced Features

Reference

Usage Examples

Example 1: Quick Threshold Optimization (NEW!)

Example 2: Full Workflow

Example 3: Ensemble Analysis with ATO

Performance

ML Recommendation Accuracy

Threshold Optimizer Benefits

Contributing

Citation

License

Contact

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes