Skip to main content

Open-source platform for perturbation biology, causal discovery and optimal intervention design

Project description

OpenPerturbation

The AI-powered platform for perturbation biology โ€“ causal discovery, multimodal fusion & optimal intervention design.

OpenPerturbation logo

CI Status Documentation PyPI version Coverage License GitHub release

OpenPerturbation is a cutting-edge, production-ready platform for AI-driven perturbation biology analysis. It combines state-of-the-art causal discovery algorithms, multimodal deep learning, and optimal intervention design in a single, comprehensive framework. Built with modern software engineering practices, it provides both a powerful Python API and a full-featured REST API service.

Author: Nik Jois
Email: nikjois@llamasearch.ai
Version: v1.1.0


Table of Contents

  1. Features
  2. Quick Start
  3. Installation
  4. Usage Examples
  5. Project Structure
  6. API Overview
  7. Configuration
  8. Docker Deployment
  9. Testing
  10. Documentation
  11. Benchmarks
  12. Contributing
  13. License
  14. Citation
  15. Contact

Key Features

๐Ÿงฌ Advanced Causal Discovery

  • Multiple Algorithms: PC, GES, LiNGAM, DirectLiNGAM, and correlation-based methods
  • Constraint-based & Score-based: Support for both paradigms with automatic method selection
  • GPU Acceleration: CUDA-optimized implementations for large-scale datasets
  • Statistical Testing: Comprehensive independence testing with multiple test statistics

๐ŸŽฏ Intelligent Intervention Design

  • Optimal Targeting: AI-driven recommendations for genetic and chemical perturbations
  • Multi-objective Optimization: Balance efficacy, cost, and feasibility constraints
  • Active Learning: Iterative experiment design with uncertainty quantification
  • Budget-aware Planning: Resource optimization for experimental campaigns

๐Ÿ”ฌ Multimodal Data Integration

  • Genomics Support: Single-cell RNA-seq, bulk RNA-seq, ATAC-seq, ChIP-seq
  • High-content Imaging: Cell painting, microscopy, and morphological profiling
  • Chemical Structures: SMILES, molecular graphs, and compound libraries
  • Data Fusion: Advanced transformer-based architectures for multimodal learning

๐Ÿค– Explainable AI & Interpretability

  • Attention Visualization: Hierarchical attention maps for model interpretability
  • Concept Activation: TCAV-based concept importance analysis
  • Pathway Analysis: Integration with KEGG, Reactome, and GO databases
  • Mechanistic Insights: Causal pathway discovery and validation

๐Ÿš€ Production-Ready Architecture

  • FastAPI Backend: 25+ typed endpoints with automatic OpenAPI documentation
  • Async Processing: Non-blocking job queues with progress tracking
  • Docker Support: Containerized deployment with Docker Compose
  • Cloud Ready: AWS, GCP, and Azure deployment configurations
  • Monitoring: Comprehensive logging, metrics, and health checks

๐Ÿ”ง Developer Experience

  • Type Safety: Full static typing with Pydantic v2 models
  • Comprehensive Testing: 90%+ code coverage with unit, integration, and E2E tests
  • CI/CD Pipeline: GitHub Actions with automated testing and deployment
  • Documentation: Interactive notebooks, API docs, and deployment guides

Quick Start

Prerequisites

  • Python โ‰ฅ 3.10
  • Git
  • Docker (optional, for containerized deployment)

Installation

Option 1: PyPI (Recommended)

pip install openperturbation

Option 2: From Source (Development)

# Clone the repository
git clone https://github.com/nikjois/OpenPerturbation.git
cd OpenPerturbation

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install in editable mode
pip install -e ".[dev]"

Option 3: Docker

# Clone and start services
git clone https://github.com/nikjois/OpenPerturbation.git
cd OpenPerturbation
docker-compose up --build -d

Verify Installation

# Run tests
pytest tests/ -v

# Start API server
python -m src.api.server

# Check health endpoint
curl http://localhost:8000/health

The interactive API documentation is available at http://localhost:8000/docs.


Usage Examples

Python API

Causal Discovery

import pandas as pd
from src.causal.causal_discovery_engine import CausalDiscoveryEngine

# Load your data
data = pd.read_csv("gene_expression.csv")

# Initialize causal discovery engine
engine = CausalDiscoveryEngine(method="pc", alpha=0.05)

# Discover causal relationships
results = engine.discover_causal_graph(data)

# Access results
print("Discovered edges:", results.edges)
print("Graph adjacency matrix:", results.adjacency_matrix)

Intervention Design

from src.causal.intervention import ExperimentalDesignEngine

# Initialize intervention design
designer = ExperimentalDesignEngine(
    causal_graph=results.graph,
    budget=10000,
    target_genes=["TP53", "MYC", "EGFR"]
)

# Design optimal interventions
interventions = designer.design_interventions(
    n_experiments=20,
    objective="maximize_information"
)

print("Recommended interventions:", interventions)

Multimodal Analysis

from src.pipeline.openperturbation_pipeline import OpenPerturbationPipeline
from omegaconf import DictConfig

# Configure analysis pipeline
config = DictConfig({
    "data": {
        "genomics_path": "data/rnaseq.h5ad",
        "imaging_path": "data/cell_painting/",
        "batch_size": 32
    },
    "model": {
        "type": "multimodal_fusion",
        "hidden_dim": 256,
        "num_layers": 4
    },
    "training": {
        "max_epochs": 100,
        "learning_rate": 1e-4
    }
})

# Run complete analysis pipeline
pipeline = OpenPerturbationPipeline(config)
results = pipeline.run_full_pipeline()

REST API

Start Analysis Job

curl -X POST "http://localhost:8000/api/v1/analysis/start" \
     -H "Content-Type: application/json" \
     -d '{
       "data_path": "/data/experiment.csv",
       "analysis_type": "causal_discovery",
       "parameters": {
         "method": "pc",
         "alpha": 0.05
       }
     }'

Check Job Status

curl "http://localhost:8000/api/v1/analysis/status/{job_id}"

Upload Data

curl -X POST "http://localhost:8000/api/v1/data/upload" \
     -F "file=@experiment.csv" \
     -F "data_type=genomics"

Project Structure

OpenPerturbation/
โ”œโ”€โ”€ src/                          # Source code
โ”‚   โ”œโ”€โ”€ api/                      # FastAPI application
โ”‚   โ”‚   โ”œโ”€โ”€ main.py              # API server entry point
โ”‚   โ”‚   โ”œโ”€โ”€ endpoints.py         # Route handlers
โ”‚   โ”‚   โ””โ”€โ”€ routes/              # Route modules
โ”‚   โ”œโ”€โ”€ agents/                   # OpenAI integration
โ”‚   โ”‚   โ”œโ”€โ”€ openai_agent.py      # AI agent implementation
โ”‚   โ”‚   โ””โ”€โ”€ conversation_handler.py
โ”‚   โ”œโ”€โ”€ causal/                   # Causal discovery & intervention
โ”‚   โ”‚   โ”œโ”€โ”€ causal_discovery_engine.py
โ”‚   โ”‚   โ””โ”€โ”€ intervention.py
โ”‚   โ”œโ”€โ”€ data/                     # Data loading & processing
โ”‚   โ”‚   โ”œโ”€โ”€ loaders/             # Data loaders
โ”‚   โ”‚   โ””โ”€โ”€ processors/          # Data preprocessing
โ”‚   โ”œโ”€โ”€ explainability/          # Model interpretability
โ”‚   โ”‚   โ”œโ”€โ”€ attention_maps.py
โ”‚   โ”‚   โ”œโ”€โ”€ concept_activation.py
โ”‚   โ”‚   โ””โ”€โ”€ pathway_analysis.py
โ”‚   โ”œโ”€โ”€ models/                   # Neural network models
โ”‚   โ”‚   โ”œโ”€โ”€ causal/              # Causal models
โ”‚   โ”‚   โ”œโ”€โ”€ fusion/              # Multimodal fusion
โ”‚   โ”‚   โ”œโ”€โ”€ graph/               # Graph neural networks
โ”‚   โ”‚   โ””โ”€โ”€ vision/              # Computer vision models
โ”‚   โ”œโ”€โ”€ pipeline/                 # Analysis pipelines
โ”‚   โ”œโ”€โ”€ training/                 # Training infrastructure
โ”‚   โ”‚   โ”œโ”€โ”€ data_modules.py      # PyTorch Lightning data modules
โ”‚   โ”‚   โ””โ”€โ”€ lightning_modules.py # Model training logic
โ”‚   โ””โ”€โ”€ utils/                    # Utilities
โ”œโ”€โ”€ tests/                        # Test suite
โ”‚   โ”œโ”€โ”€ test_api.py              # API tests
โ”‚   โ”œโ”€โ”€ test_comprehensive.py    # Integration tests
โ”‚   โ”œโ”€โ”€ test_openai_agents.py    # Agent tests
โ”‚   โ””โ”€โ”€ benchmarks/              # Performance benchmarks
โ”œโ”€โ”€ configs/                      # Configuration files
โ”‚   โ”œโ”€โ”€ main_config.yaml         # Main configuration
โ”‚   โ”œโ”€โ”€ data/                    # Data configs
โ”‚   โ”œโ”€โ”€ experiment/              # Experiment configs
โ”‚   โ””โ”€โ”€ model/                   # Model configs
โ”œโ”€โ”€ docs/                         # Documentation
โ”œโ”€โ”€ notebooks/                    # Jupyter notebooks
โ”œโ”€โ”€ docker/                       # Docker configuration
โ”œโ”€โ”€ Dockerfile                    # Container definition
โ”œโ”€โ”€ docker-compose.yml           # Multi-service setup
โ”œโ”€โ”€ requirements.txt              # Python dependencies
โ”œโ”€โ”€ pyproject.toml               # Package configuration
โ””โ”€โ”€ README.md                    # This file

API Overview

OpenPerturbation provides a comprehensive REST API with 25+ endpoints:

Core Endpoints

Method Path Description
GET /health Health check
GET / API information
GET /docs Interactive API documentation

Analysis Endpoints

Method Path Description
POST /api/v1/analysis/start Start analysis job
GET /api/v1/analysis/status/{job_id} Get job status
POST /api/v1/causal-discovery Run causal discovery
POST /api/v1/intervention-design Design interventions
POST /api/v1/explainability/analyze Generate explanations

Data Management

Method Path Description
POST /api/v1/data/upload Upload datasets
GET /api/v1/datasets List datasets
GET /api/v1/datasets/{id} Get dataset info

Model Management

Method Path Description
GET /api/v1/models List available models
GET /api/v1/models/{name} Get model details

System Information

Method Path Description
GET /api/v1/system/info System information
POST /api/v1/validate-config Validate configuration

Full API documentation with interactive examples is available at /docs when the server is running.


Configuration

OpenPerturbation uses Hydra for configuration management. Configuration files are located in the configs/ directory:

Main Configuration (configs/main_config.yaml)

defaults:
  - data: high_content_screening
  - model: multimodal_fusion
  - experiment: causal_discovery

# Global settings
project_name: "openperturbation_experiment"
seed: 42
output_dir: "outputs"

# API settings
api:
  host: "0.0.0.0"
  port: 8000
  workers: 4

# Logging
logging:
  level: INFO
  format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"

Environment Variables

Create a .env file in the project root:

# OpenAI API (for AI agents)
OPENAI_API_KEY=your-openai-api-key

# Database (optional)
DATABASE_URL=postgresql://user:password@localhost/openperturbation

# Cloud storage (optional)
AWS_ACCESS_KEY_ID=your-aws-key
AWS_SECRET_ACCESS_KEY=your-aws-secret
S3_BUCKET=your-bucket-name

Docker Deployment

Development Setup

# Start all services
docker-compose up --build

# Start in background
docker-compose up -d

# View logs
docker-compose logs -f

# Stop services
docker-compose down

Production Deployment

# Build production image
docker build -t openperturbation:latest .

# Run with environment variables
docker run -d \
  --name openperturbation \
  -p 8000:8000 \
  -e OPENAI_API_KEY=your-key \
  openperturbation:latest

Kubernetes Deployment

# kubernetes-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: openperturbation
spec:
  replicas: 3
  selector:
    matchLabels:
      app: openperturbation
  template:
    metadata:
      labels:
        app: openperturbation
    spec:
      containers:
      - name: openperturbation
        image: openperturbation:latest
        ports:
        - containerPort: 8000
        env:
        - name: OPENAI_API_KEY
          valueFrom:
            secretKeyRef:
              name: openperturbation-secrets
              key: openai-api-key

Testing

OpenPerturbation includes a comprehensive test suite with 90%+ code coverage:

Run All Tests

# Run complete test suite
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=src --cov-report=html

# Run specific test categories
pytest tests/test_api.py -v                    # API tests
pytest tests/test_comprehensive.py -v          # Integration tests
pytest tests/benchmarks/ -v                    # Performance tests

Test Categories

  1. Unit Tests: Individual component testing
  2. Integration Tests: End-to-end workflow testing
  3. API Tests: REST API endpoint validation
  4. Performance Tests: Benchmarking and load testing
  5. Agent Tests: OpenAI integration testing

Continuous Integration

GitHub Actions automatically runs tests on:

  • Python 3.10, 3.11, 3.12
  • Ubuntu, macOS, Windows
  • Pull requests and pushes to main

Documentation

๐Ÿ“š Complete documentation: nikjois.github.io/OpenPerturbation

Key Documentation Sections:

Jupyter Notebooks

Interactive tutorials and examples are available in the notebooks/ directory:

  • 01_loading_multimodal_data.ipynb: Data loading and preprocessing
  • 02_training_a_model.ipynb: Model training and evaluation
  • 03_causal_discovery.ipynb: Causal analysis workflows

Benchmarks

Performance benchmarks are continuously monitored and reported:

Causal Discovery Performance

Algorithm Dataset Size Runtime Memory Accuracy
PC 1K variables 2.3s 1.2GB 0.89
GES 1K variables 5.1s 2.1GB 0.92
LiNGAM 1K variables 1.8s 0.8GB 0.86

API Performance

Endpoint Avg Response Time 95th Percentile Throughput
/health 12ms 25ms 1200 req/s
/causal-discovery 340ms 680ms 45 req/s
/intervention-design 180ms 320ms 78 req/s

Benchmarks run on AWS c5.2xlarge instance with 8 vCPUs and 16GB RAM

Run benchmarks locally:

pytest tests/benchmarks/ -v --benchmark-only

Contributing

We welcome contributions from the community! Please see our Contributing Guide for details.

Quick Start for Contributors

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/your-feature
  3. Make your changes and add tests
  4. Ensure all tests pass: pytest tests/
  5. Submit a pull request

Development Setup

# Clone your fork
git clone https://github.com/your-username/OpenPerturbation.git
cd OpenPerturbation

# Install development dependencies
pip install -e ".[dev]"

# Install pre-commit hooks
pre-commit install

# Run code quality checks
make lint

License

OpenPerturbation is released under the MIT License. See LICENSE for details.


Citation

If you use OpenPerturbation in your research, please cite:

@software{jois2025openperturbation,
  title = {OpenPerturbation: AI-Driven Platform for Perturbation Biology},
  author = {Jois, Nik},
  year = {2025},
  version = {1.1.0},
  url = {https://github.com/nikjois/OpenPerturbation},
  doi = {10.5281/zenodo.xxxxx}
}

Related Publications

  1. Jois, N. (2025). "Multimodal Causal Discovery in Perturbation Biology." Journal of Computational Biology. (In preparation)
  2. Jois, N. (2025). "Optimal Intervention Design using AI-Guided Experimental Automation." Nature Methods. (Under review)

Contact

Nik Jois
๐Ÿ“ง nikjois@llamasearch.ai
๐Ÿ™ GitHub
๐Ÿ”— LinkedIn

Community & Support

  • GitHub Discussions: Ask questions and share ideas
  • Issue Tracker: Report bugs and request features
  • Discord: Join our developer community (coming soon)

Professional Services

For enterprise support, custom development, or consulting services, please contact nikjois@llamasearch.ai.


Built with โค๏ธ for the scientific community. Let's accelerate biological discovery together!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openperturbation-1.1.1.tar.gz (10.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openperturbation-1.1.1-py3-none-any.whl (228.2 kB view details)

Uploaded Python 3

File details

Details for the file openperturbation-1.1.1.tar.gz.

File metadata

  • Download URL: openperturbation-1.1.1.tar.gz
  • Upload date:
  • Size: 10.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.4

File hashes

Hashes for openperturbation-1.1.1.tar.gz
Algorithm Hash digest
SHA256 cac306f7c7f98a932e07fb3f4021e842eaec144c8cb6a6bb5c3f7f3a361ce4bb
MD5 9a6ad01aa0613e12d30b44a4a1438eb6
BLAKE2b-256 8b103b4d6e95b9a11689ebed51bf271d29e60b574057bb6aa41ef12dbe028d46

See more details on using hashes here.

File details

Details for the file openperturbation-1.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for openperturbation-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 35eaeaa941a815ed4aec37c84b23b4e3f7f15b76a5564654fea27b02049735fc
MD5 ffa46f207f6d9985bde5974758ba9360
BLAKE2b-256 81893ee88e0b9eef19d5eaf8a80f15a31c1d2bfc60bb69b694b50a100f37e0f5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page