Skip to main content

synthetic-models is a virtual cell generation library for generating biological synthetic data that can be used for benchmarking predictive modelling workflows. The synthetic data are generated using ODE models formulated in biochemical laws common in cancer cell signalling networks. Synthetic can create datasets in the format of scikit-learn's make_regression method.

Project description

Synthetic

Synthetic (aka ModelBuilder) is a virtual cell generation library for generating biological synthetic data that can be used for benchmarking predictive modelling workflows. The synthetic data are generated using ODE models formulated in biochemical laws common in cancer cell signalling networks. Synthetic can create datasets in the format of scikit-learn's make_regression method.

Installation

pip install synthetic-models

For development:

git clone https://github.com/AnEvilBurrito/Synthetic.git
cd Synthetic
uv sync

Alpha Testing

For lab members participating in alpha testing, install directly from the lab's GitHub repository:

pip install git+https://github.com/IntegratedNetworkModellingLab/Synthetic.git

To install a specific version or commit:

pip install git+https://github.com/IntegratedNetworkModellingLab/Synthetic.git@v0.1.0
# or
pip install git+https://github.com/IntegratedNetworkModellingLab/Synthetic.git@<commit-hash>

For lab members with write access, you can use SSH:

pip install git+ssh://git@github.com/IntegratedNetworkModellingLab/Synthetic.git

Quick Start

The high-level API provides a simple interface for creating virtual cell models and generating sklearn-compatible datasets:

from synthetic import Builder, make_dataset_drug_response

# Create a virtual cell model (auto-compiles with default settings)
vc = Builder.specify(degree_cascades=[1, 2, 5], random_seed=42)

# Generate a sklearn-compatible dataset
X, y = make_dataset_drug_response(n=1000, cell_model=vc, target_specie='Oa')

print(f"Feature matrix shape: {X.shape}")  # (1000, n_features)
print(f"Target vector shape: {y.shape}")    # (1000,)

Examples

The examples/ directory contains practical examples demonstrating various features of Synthetic:

create_dataset.py

Basic example showing how to generate a synthetic drug response dataset and analyze feature correlations.

python examples/create_dataset.py

Demonstrates:

  • Creating a virtual cell model with custom network topology
  • Generating sklearn-compatible datasets
  • Calculating Pearson correlations between features and targets
  • Identifying predictive features in the network

export_model_sbml.py

Shows how to export synthetic models to standard SBML and Antimony formats for interoperability with other tools.

python examples/export_model_sbml.py

Demonstrates:

  • Creating a virtual cell model
  • Accessing the underlying ModelBuilder
  • Exporting models to SBML format (for COPASI, libRoadRunner, etc.)
  • Exporting models to Antimony format (human-readable)
  • Retrieving model strings for programmatic use

detailed_examples.md

Comprehensive guide with detailed workflows and advanced use cases, including:

  • Feature correlation analysis - Complete workflow with CSV export
  • Combination therapy - Testing multi-drug interactions
  • Step-by-step explanations of each workflow

View the guide:

cat examples/detailed_examples.md

or open it in your preferred markdown viewer.

API Overview

VirtualCell

The VirtualCell class represents a virtual cell model with a hierarchical network topology.

from synthetic import VirtualCell

# Create a virtual cell without auto-compiling
vc = VirtualCell(
    degree_cascades=[1, 2, 5],      # Network topology: 1 cascade degree 1, 2 cascades degree 2, 5 cascades degree 3
    name="MyCell",
    random_seed=42,
    feedback_density=0.5,           # 50% of possible feedback connections
    auto_compile=False,             # Disable auto-compile for custom configuration
    auto_drug=True,                 # Auto-generate a drug targeting degree 1 species
    drug_name="D",
    drug_start_time=5000.0,          # Drug becomes active at t=5000
    drug_value=100.0,                # Drug active concentration
    drug_regulation_type="down",    # Drug inhibits targets
    simulation_end=10000.0,          # Default simulation end time
)

# Compile the model (required before simulation)
vc.compile(
    mean_range_species=(50, 150),          # Initial concentration range
    rangeScale_params=(0.8, 1.2),          # Parameter scaling range
    use_kinetic_tuner=True,                # Use KineticParameterTuner for biologically plausible parameters
    active_percentage_range=(0.3, 0.7),     # Target 30-70% active species
)

Methods

Method Description
add_drug(...) Add a drug to the model
list_drugs() List all drugs in the system
compile(...) Compile the model (lazy initialization)
get_species_names(exclude_drugs=True) Get list of species names
get_initial_values(exclude_drugs=True) Get initial species values
get_target_concentrations() Get target active concentrations from kinetic tuner

Properties

Property Description
spec Underlying DegreeInteractionSpec object
model Underlying ModelBuilder object
tuner Underlying KineticParameterTuner object (if used)

Builder

The Builder class provides a factory method for creating VirtualCell instances with a clean, fluent API.

from synthetic import Builder

# Basic usage (auto-compiles)
vc = Builder.specify(degree_cascades=[1, 2, 5])

# With custom parameters
vc = Builder.specify(
    degree_cascades=[3, 6, 15, 25],
    name="ComplexCell",
    random_seed=42,
    feedback_density=0.5,
    auto_compile=True,
    auto_drug=True,
    drug_name="Inhibitor",
    drug_start_time=5000.0,
    drug_value=100.0,
    drug_regulation_type="down",
    simulation_end=10000.0,
)

Adding Drugs

Drugs can be added manually to target degree 1 R species:

# Create a virtual cell without auto-drug
vc = Builder.specify(degree_cascades=[1, 2, 5], auto_drug=False)

# Add a drug manually (returns self for chaining)
vc.add_drug(
    name="DrugX",
    start_time=500.0,               # Time at which drug becomes active
    default_value=0.0,              # Default value when not active
    regulation=["R1_1", "R1_2"],     # Target species (must be degree 1 R species)
    regulation_type=["down", "down"], # Regulation types: "up" or "down"
    value=100.0,                    # Optional override for active concentration
)

# Compile to apply drugs
vc.compile()

List all drugs in the system:

drugs = vc.list_drugs()
for drug in drugs:
    print(f"Drug: {drug['name']}, Targets: {drug['targets']}, Types: {drug['types']}")

make_dataset_drug_response

Generate synthetic drug response datasets compatible with scikit-learn's make_regression format.

from synthetic import Builder, make_dataset_drug_response

vc = Builder.specify(degree_cascades=[1, 2, 5], random_seed=42)

# Basic usage
X, y = make_dataset_drug_response(
    n=1000,                          # Number of samples
    cell_model=vc,                   # Compiled VirtualCell instance
    target_specie='Oa',              # Outcome species to use as target
)

# With custom parameters
X, y = make_dataset_drug_response(
    n=500,
    cell_model=vc,
    target_specie='Oa',
    perturbation_type='gaussian',    # 'uniform', 'gaussian', 'lognormal', 'lhs'
    perturbation_params={'rsd': 0.2}, # Relative standard deviation
    simulation_params={'start': 0, 'end': 10000, 'points': 101},
    solver_type='scipy',            # 'scipy' or 'roadrunner'
    jit=True,                        # Enable JIT compilation (scipy only)
    verbose=True,                    # Show progress bar
    seed=42,                         # Random seed for reproducibility
)

Extended Return Format

For more detailed output, use return_details=True:

result = make_dataset_drug_response(
    n=100,
    cell_model=vc,
    return_details=True,
)

# Access extended data structure
X = result['X']                      # Feature matrix
y = result['y']                      # Target vector
features = result['features']       # Feature dataframe (initial values)
targets = result['targets']         # Target dataframe (outcome values)
timecourse = result['timecourse']   # Timecourse simulation data
metadata = result['metadata']        # Metadata about generation

Perturbation Types

Type Description Parameters
uniform Uniform distribution min, max
gaussian Gaussian (normal) distribution rsd (relative standard deviation)
lognormal Log-normal distribution rsd
lhs Latin Hypercube Sampling rsd

Kinetic Parameter Perturbation

You can also perturb kinetic parameters:

X, y = make_dataset_drug_response(
    n=100,
    cell_model=vc,
    parameter_values={'Km_J0': 100.0},      # Base parameters to perturb
    param_perturbation_type='gaussian',
    param_perturbation_params={'rsd': 0.2},
    param_seed=42,                          # Separate seed for parameters
)

Simulation Robustness

The API includes robust simulation handling with automatic resampling:

X, y = make_dataset_drug_response(
    n=100,
    cell_model=vc,
    resample_size=10,           # Number of alternative samples on failure
    max_retries=3,              # Maximum retries per failed index
    require_all_successful=False, # If False, returns partial results
)

Solver Options

Two solver backends are available:

Solver Description Features
scipy Uses scipy.integrate.odeint with JIT compilation Fast for batch simulations
roadrunner Uses libRoadRunner for SBML simulation Mature, robust
# Scipy solver (default)
X, y = make_dataset_drug_response(
    n=100, cell_model=vc, solver_type='scipy', jit=True
)

# Roadrunner solver
X, y = make_dataset_drug_response(
    n=100, cell_model=vc, solver_type='roadrunner'
)

Example: Complete Workflow

from synthetic import Builder, make_dataset_drug_response
import numpy as np

# 1. Create a virtual cell with custom network topology
vc = Builder.specify(
    degree_cascades=[3, 6, 15],   # 3 degree-1, 6 degree-2, 15 degree-3 cascades
    random_seed=42,
    feedback_density=0.5,
)

# 4. Generate dataset
X, y = make_dataset_drug_response(
    n=1000,
    cell_model=vc,
    target_specie='Oa',
    perturbation_type='gaussian',
    perturbation_params={'rsd': 0.2},
    simulation_params={'start': 0, 'end': 8000, 'points': 101},
    seed=42,
    verbose=True,
)

# 5. Use with sklearn
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = RandomForestRegressor(random_state=42)
model.fit(X_train, y_train)

score = model.score(X_test, y_test)
print(f"Model R^2 score: {score:.3f}")

Network Topology

The degree_cascades parameter defines the hierarchical network structure:

# Simple network: 1 cascade each across 3 degrees
vc = Builder.specify(degree_cascades=[1, 1, 1])
# Creates: R1_1 -> I1_1, R2_1 -> I2_1, R3_1 -> I3_1, with hierarchical connections

# More complex network
vc = Builder.specify(degree_cascades=[3, 6, 15, 25])
# Creates many parallel cascades with increasing complexity per degree

Each cascade creates:

  • A Receptor (R) species at each degree
  • An Intermediate (I) species at each degree
  • The first degree also connects to an Outcome (O) species

Kinetic Parameter Tuning

The library includes KineticParameterTuner for generating biologically plausible kinetic parameters:

from synthetic import KineticParameterTuner

vc = Builder.specify(degree_cascades=[3, 6, 15], random_seed=42)
vc.compile(
    use_kinetic_tuner=True,
    active_percentage_range=(0.3, 0.7),  # Target 30-70% active states
    X_total_multiplier=5.0,               # Km_b = X_total * 5.0
    ki_val=100.0,                        # Constant Ki for inhibitors
    v_max_f_random_range=(5.0, 10.0),    # Forward Vmax range
)

# Access target concentrations
targets = vc.get_target_concentrations()
for species, concentration in targets.items():
    print(f"{species}: {concentration:.2f}")

Advanced Usage

Accessing Underlying Components

vc = Builder.specify(degree_cascades=[1, 2, 5])

# Access the specification
spec = vc.spec
print(spec.species_list)

# Access the model builder
model = vc.model
print(model.get_antimony_model())

# Access the tuner (if used)
tuner = vc.tuner
print(tuner.get_target_concentrations())

Direct Model Building

For more control, you can use the lower-level API:

from synthetic.Specs.DegreeInteractionSpec import DegreeInteractionSpec
from synthetic.Specs.Drug import Drug
from synthetic.utils.kinetic_tuner import KineticParameterTuner

# Create specification
spec = DegreeInteractionSpec(degree_cascades=[1, 2, 5])
spec.generate_specifications(random_seed=42, feedback_density=0.5)

# Add drug
drug = Drug(
    name="D",
    start_time=5000.0,
    default_value=0.0,
    regulation=["R1_1", "R1_2", "R1_3"],
    regulation_type=["down", "down", "down"],
)
spec.add_drug(drug, value=100.0)

# Generate model
model = spec.generate_network(
    network_name="MyNetwork",
    random_seed=42,
)
model.precompile()

# Apply kinetic tuning
tuner = KineticParameterTuner(model, random_seed=42)
updated_params = tuner.generate_parameters(
    active_percentage_range=(0.3, 0.7),
    X_total_multiplier=5.0,
    ki_val=100.0,
)
for param_name, value in updated_params.items():
    model.set_parameter(param_name, value)

Development

Running Tests

pytest

Run specific tests:

pytest tests/test_api.py::TestVirtualCell::test_compile_with_kinetic_tuner

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

synthetic_models-0.1.3.tar.gz (695.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

synthetic_models-0.1.3-py3-none-any.whl (97.9 kB view details)

Uploaded Python 3

File details

Details for the file synthetic_models-0.1.3.tar.gz.

File metadata

  • Download URL: synthetic_models-0.1.3.tar.gz
  • Upload date:
  • Size: 695.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for synthetic_models-0.1.3.tar.gz
Algorithm Hash digest
SHA256 bd3cce1032264998e43a678c43ddcb287ea496ecb05e75955e08a04f4325ae81
MD5 5ee1640999e0c229894c73013fcf7bde
BLAKE2b-256 2f53c6d211bb213c4b53f1a8d9143e752436cdee9fb3d44a19f8c54aaec79a6f

See more details on using hashes here.

File details

Details for the file synthetic_models-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: synthetic_models-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 97.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for synthetic_models-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 9168175e06dcf7a1164133a42e9b0b05a9b25f82861f227cebcf67a490e3cb7a
MD5 c584d500d207e4311e3c95fd0255732f
BLAKE2b-256 b7dd5752fbfb409852596919194533aa3f3f92ecec0a0e65a4abd1c91ef6264c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page