Open-source platform for perturbation biology, causal discovery and optimal intervention design
Project description
OpenPerturbation
Advanced Perturbation Biology Analysis Platform with AI Integration
OpenPerturbation is a comprehensive, production-ready platform for analyzing perturbation biology data using cutting-edge machine learning, causal discovery, and explainable AI techniques. Built for researchers, data scientists, and bioinformaticians working with single-cell RNA-seq, imaging, and molecular data.
Author
Nik Jois - nikjois@llamasearch.ai
Key Features
Core Capabilities
- Multi-modal Data Integration: Seamlessly handle genomics, imaging, and molecular data
- Advanced Causal Discovery: Identify causal relationships in biological systems
- Explainable AI: Interpret model predictions with attention maps and pathway analysis
- Intervention Design: Optimize experimental strategies using causal understanding
- OpenAI Agents Integration: Natural language interface for complex analyses
Technical Excellence
- Production-Ready: Complete Docker containerization and CI/CD pipeline
- Comprehensive Testing: 95%+ test coverage with automated quality assurance
- Professional API: FastAPI endpoints with complete documentation
- Scalable Architecture: PyTorch Lightning for distributed training
- Type Safety: Full type annotations with Pyright validation
Quick Start
Installation
# Install from PyPI
pip install openperturbation
# Or install from source
git clone https://github.com/llamasearchai/OpenPerturbation.git
cd OpenPerturbation
pip install -e .
Docker Deployment
# Pull and run the container
docker pull ghcr.io/llamasearchai/openperturbation:latest
docker run -p 8000:8000 ghcr.io/llamasearchai/openperturbation:latest
# Or build locally
docker build -t openperturbation .
docker run -p 8000:8000 openperturbation
Basic Usage
from openperturbation import OpenPerturbationPipeline
from omegaconf import DictConfig
# Configure your analysis
config = DictConfig({
"data": {
"data_dir": "path/to/your/data",
"batch_size": 32
},
"model": {
"model_type": "multimodal_fusion",
"learning_rate": 1e-4
},
"experiment": {
"name": "my_perturbation_analysis",
"output_dir": "results/"
}
})
# Run complete analysis pipeline
pipeline = OpenPerturbationPipeline(config)
results = pipeline.run_full_pipeline()
# Access results
print(f"Training completed with validation loss: {results['training']['best_val_loss']}")
print(f"Discovered {results['causal_discovery']['n_edges']} causal relationships")
OpenAI Agents Interface
from openperturbation.agents import create_openperturbation_agent
import asyncio
# Create an AI agent for interactive analysis
agent = create_openperturbation_agent("general", api_key="your-openai-key")
# Natural language queries
async def analyze_data():
response = await agent.process_message(
"Run causal discovery analysis on my single-cell dataset and explain the key findings"
)
print(response)
asyncio.run(analyze_data())
Architecture Overview
Pipeline Components
-
Data Loading & Processing
- Multi-format support (H5AD, CSV, HDF5)
- Automated quality control and normalization
- Synthetic data generation for testing
-
Model Training
- Vision Transformers for imaging data
- Graph Neural Networks for molecular structures
- Multimodal fusion architectures
-
Causal Discovery
- PC Algorithm implementation
- Constraint-based methods
- Bootstrap validation
-
Explainability Analysis
- Attention visualization
- Concept activation vectors
- Pathway enrichment analysis
-
Intervention Design
- Optimal experimental design
- Active learning strategies
- Budget-constrained optimization
API Endpoints
The FastAPI server provides comprehensive REST endpoints:
GET /health- System health checkPOST /api/v1/data/upload- Upload datasetsPOST /api/v1/experiments/create- Create new experimentsGET /api/v1/experiments/{id}/results- Retrieve resultsPOST /api/v1/analysis/causal-discovery- Run causal analysisPOST /api/v1/analysis/explainability- Generate explanationsPOST /api/v1/agents/chat- OpenAI agent interface
Data Types Supported
Genomics Data
- Single-cell RNA-seq (H5AD, CSV formats)
- Bulk RNA-seq data
- Perturbation screens
- Time-series experiments
Imaging Data
- High-content screening images
- Microscopy data
- Multi-channel fluorescence
- Morphological features
Molecular Data
- Chemical structures (SMILES)
- Protein sequences
- Drug-target interactions
- Pathway annotations
Advanced Features
Causal Discovery Methods
- PC Algorithm: Constraint-based causal discovery
- GES: Score-based structure learning
- FCI: Handling latent confounders
- Bootstrap Validation: Statistical significance testing
Explainability Techniques
- Attention Maps: Visualize model focus areas
- TCAV: Testing with Concept Activation Vectors
- SHAP Values: Feature importance analysis
- Pathway Analysis: Biological interpretation
Intervention Strategies
- Optimal Design: Maximize information gain
- Budget Constraints: Resource-aware planning
- Active Learning: Iterative experiment selection
- Multi-objective Optimization: Balance multiple goals
Development
Local Development Setup
# Clone repository
git clone https://github.com/llamasearchai/OpenPerturbation.git
cd OpenPerturbation
# Create virtual environment
python -m venv openperturbation-env
source openperturbation-env/bin/activate # On Windows: openperturbation-env\Scripts\activate
# Install dependencies
pip install -r requirements.txt
pip install -e .
# Run tests
pytest tests/ -v
# Start development server
python src/api/main.py
Testing
# Run all tests
make test
# Run specific test suites
pytest tests/test_api.py -v
pytest tests/test_comprehensive.py -v
pytest tests/test_openai_agents.py -v
# Run with coverage
pytest --cov=src tests/
Code Quality
# Type checking
pyright src/
# Linting
ruff check src/
black src/
# Security scanning
bandit -r src/
Documentation
Comprehensive documentation is available:
- API Reference: Complete endpoint documentation
- User Guide: Step-by-step tutorials
- Developer Guide: Architecture and contribution guidelines
- Cookbooks: Example analyses and use cases
Access documentation at: https://openperturbation.readthedocs.io
Performance Benchmarks
OpenPerturbation has been benchmarked on standard datasets:
- Single-cell Analysis: 100K+ cells processed in <5 minutes
- Causal Discovery: 1000-variable networks in <30 seconds
- Model Training: GPU acceleration with mixed precision
- API Response Time: <100ms for most endpoints
Production Deployment
Kubernetes
apiVersion: apps/v1
kind: Deployment
metadata:
name: openperturbation
spec:
replicas: 3
selector:
matchLabels:
app: openperturbation
template:
metadata:
labels:
app: openperturbation
spec:
containers:
- name: openperturbation
image: ghcr.io/llamasearchai/openperturbation:latest
ports:
- containerPort: 8000
env:
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: openai-secret
key: api-key
Environment Variables
# Required
OPENAI_API_KEY=your-openai-api-key
# Optional
PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512
WANDB_API_KEY=your-wandb-key
NEPTUNE_API_TOKEN=your-neptune-token
Contributing
We welcome contributions! Please see our Contributing Guide for details.
Development Workflow
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Make your changes with tests
- Run quality checks:
make check - Commit changes:
git commit -m 'Add amazing feature' - Push to branch:
git push origin feature/amazing-feature - Open a Pull Request
Code Standards
- Type Safety: All code must include type annotations
- Testing: Minimum 90% test coverage required
- Documentation: All public APIs must be documented
- Performance: No regressions in benchmark tests
License
This project is licensed under the MIT License - see the LICENSE file for details.
Citation
If you use OpenPerturbation in your research, please cite:
@software{openperturbation2024,
title={OpenPerturbation: Advanced Perturbation Biology Analysis Platform},
author={Jois, Nik},
year={2024},
url={https://github.com/llamasearchai/OpenPerturbation},
version={1.1.1}
}
Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: nikjois@llamasearch.ai
Acknowledgments
OpenPerturbation builds upon excellent open-source projects:
- PyTorch Lightning: Scalable deep learning framework
- Scanpy: Single-cell analysis toolkit
- NetworkX: Graph analysis library
- FastAPI: Modern web framework
- OpenAI: Advanced language models
Roadmap
Version 1.2 (Q2 2024)
- Multi-GPU distributed training
- Advanced visualization dashboard
- Real-time experiment monitoring
- Enhanced pathway databases
Version 1.3 (Q3 2024)
- Federated learning capabilities
- Cloud deployment templates
- Advanced statistical methods
- Mobile-responsive interface
Version 2.0 (Q4 2024)
- Foundation model integration
- Automated report generation
- Advanced optimization algorithms
- Enterprise security features
Built with precision for the scientific community by Nik Jois
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file openperturbation-1.1.2.tar.gz.
File metadata
- Download URL: openperturbation-1.1.2.tar.gz
- Upload date:
- Size: 10.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
15552e8d666993fc4e0b08c5e6dfc38757ea555fa9f63fa82dbe183c022b64a9
|
|
| MD5 |
b09078365544c48ea18a50405636531a
|
|
| BLAKE2b-256 |
a4472dc8e03a92ac70f1d001e43629fb1a4d3717e69c71e23c7f364f591689e3
|
File details
Details for the file openperturbation-1.1.2-py3-none-any.whl.
File metadata
- Download URL: openperturbation-1.1.2-py3-none-any.whl
- Upload date:
- Size: 225.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6b4f020253d7d121d785da995e06009d0037b1df17548296d005f53b85df91c0
|
|
| MD5 |
b3e5023b8dc7adfa825aa037291acddf
|
|
| BLAKE2b-256 |
ce352915414943362191fb036f9704d9a23296b12b907ceb24d1634b65500a45
|