Skip to main content

synthetic-models is a virtual cell generation library for generating biological synthetic data that can be used for benchmarking predictive modelling workflows. The synthetic data are generated using ODE models formulated in biochemical laws common in cancer cell signalling networks. Synthetic can create datasets in the format of scikit-learn's make_regression method.

Project description

Synthetic

Synthetic (aka ModelBuilder) is a virtual cell generation library for generating biological synthetic data that can be used for benchmarking predictive modelling workflows. The synthetic data are generated using ODE models formulated in biochemical laws common in cancer cell signalling networks. Synthetic can create datasets in the format of scikit-learn's make_regression method.

Installation

pip install synthetic-models

For development:

git clone https://github.com/AnEvilBurrito/Synthetic.git
cd Synthetic
uv sync

Alpha Testing

For lab members participating in alpha testing, install directly from the lab's GitHub repository:

pip install git+https://github.com/IntegratedNetworkModellingLab/Synthetic.git

To install a specific version or commit:

pip install git+https://github.com/IntegratedNetworkModellingLab/Synthetic.git@v0.1.0
# or
pip install git+https://github.com/IntegratedNetworkModellingLab/Synthetic.git@<commit-hash>

For lab members with write access, you can use SSH:

pip install git+ssh://git@github.com/IntegratedNetworkModellingLab/Synthetic.git

Quick Start

The high-level API provides a simple interface for creating virtual cell models and generating sklearn-compatible datasets:

from synthetic import Builder, make_dataset_drug_response

# Create a virtual cell model (auto-compiles with default settings)
vc = Builder.specify(degree_cascades=[1, 2, 5], random_seed=42)

# Generate a sklearn-compatible dataset
X, y = make_dataset_drug_response(n=1000, cell_model=vc, target_specie='Oa')

print(f"Feature matrix shape: {X.shape}")  # (1000, n_features)
print(f"Target vector shape: {y.shape}")    # (1000,)

Examples

The examples/ directory contains practical examples demonstrating various features of Synthetic:

create_dataset.py

Basic example showing how to generate a synthetic drug response dataset and analyze feature correlations.

python examples/create_dataset.py

Demonstrates:

  • Creating a virtual cell model with custom network topology
  • Generating sklearn-compatible datasets
  • Calculating Pearson correlations between features and targets
  • Identifying predictive features in the network

export_model_sbml.py

Shows how to export synthetic models to standard SBML and Antimony formats for interoperability with other tools.

python examples/export_model_sbml.py

Demonstrates:

  • Creating a virtual cell model
  • Accessing the underlying ModelBuilder
  • Exporting models to SBML format (for COPASI, libRoadRunner, etc.)
  • Exporting models to Antimony format (human-readable)
  • Retrieving model strings for programmatic use

detailed_examples.md

Comprehensive guide with detailed workflows and advanced use cases, including:

  • Feature correlation analysis - Complete workflow with CSV export
  • Combination therapy - Testing multi-drug interactions
  • Step-by-step explanations of each workflow

View the guide:

cat examples/detailed_examples.md

or open it in your preferred markdown viewer.

API Overview

VirtualCell

The VirtualCell class represents a virtual cell model with a hierarchical network topology.

from synthetic import VirtualCell

# Create a virtual cell without auto-compiling
vc = VirtualCell(
    degree_cascades=[1, 2, 5],      # Network topology: 1 cascade degree 1, 2 cascades degree 2, 5 cascades degree 3
    name="MyCell",
    random_seed=42,
    feedback_density=0.5,           # 50% of possible feedback connections
    auto_compile=False,             # Disable auto-compile for custom configuration
    auto_drug=True,                 # Auto-generate a drug targeting degree 1 species
    drug_name="D",
    drug_start_time=5000.0,          # Drug becomes active at t=5000
    drug_value=100.0,                # Drug active concentration
    drug_regulation_type="down",    # Drug inhibits targets
    simulation_end=10000.0,          # Default simulation end time
)

# Compile the model (required before simulation)
vc.compile(
    mean_range_species=(50, 150),          # Initial concentration range
    rangeScale_params=(0.8, 1.2),          # Parameter scaling range
    use_kinetic_tuner=True,                # Use KineticParameterTuner for biologically plausible parameters
    active_percentage_range=(0.3, 0.7),     # Target 30-70% active species
)

Methods

Method Description
add_drug(...) Add a drug to the model
list_drugs() List all drugs in the system
compile(...) Compile the model (lazy initialization)
get_species_names(exclude_drugs=True) Get list of species names
get_initial_values(exclude_drugs=True) Get initial species values
get_target_concentrations() Get target active concentrations from kinetic tuner

Properties

Property Description
spec Underlying DegreeInteractionSpec object
model Underlying ModelBuilder object
tuner Underlying KineticParameterTuner object (if used)

Builder

The Builder class provides a factory method for creating VirtualCell instances with a clean, fluent API.

from synthetic import Builder

# Basic usage (auto-compiles)
vc = Builder.specify(degree_cascades=[1, 2, 5])

# With custom parameters
vc = Builder.specify(
    degree_cascades=[3, 6, 15, 25],
    name="ComplexCell",
    random_seed=42,
    feedback_density=0.5,
    auto_compile=True,
    auto_drug=True,
    drug_name="Inhibitor",
    drug_start_time=5000.0,
    drug_value=100.0,
    drug_regulation_type="down",
    simulation_end=10000.0,
)

Adding Drugs

Drugs can be added manually to target degree 1 R species:

# Create a virtual cell without auto-drug
vc = Builder.specify(degree_cascades=[1, 2, 5], auto_drug=False)

# Add a drug manually (returns self for chaining)
vc.add_drug(
    name="DrugX",
    start_time=500.0,               # Time at which drug becomes active
    default_value=0.0,              # Default value when not active
    regulation=["R1_1", "R1_2"],     # Target species (must be degree 1 R species)
    regulation_type=["down", "down"], # Regulation types: "up" or "down"
    value=100.0,                    # Optional override for active concentration
)

# Compile to apply drugs
vc.compile()

List all drugs in the system:

drugs = vc.list_drugs()
for drug in drugs:
    print(f"Drug: {drug['name']}, Targets: {drug['targets']}, Types: {drug['types']}")

make_dataset_drug_response

Generate synthetic drug response datasets compatible with scikit-learn's make_regression format.

from synthetic import Builder, make_dataset_drug_response

vc = Builder.specify(degree_cascades=[1, 2, 5], random_seed=42)

# Basic usage
X, y = make_dataset_drug_response(
    n=1000,                          # Number of samples
    cell_model=vc,                   # Compiled VirtualCell instance
    target_specie='Oa',              # Outcome species to use as target
)

# With custom parameters
X, y = make_dataset_drug_response(
    n=500,
    cell_model=vc,
    target_specie='Oa',
    perturbation_type='gaussian',    # 'uniform', 'gaussian', 'lognormal', 'lhs'
    perturbation_params={'rsd': 0.2}, # Relative standard deviation
    simulation_params={'start': 0, 'end': 10000, 'points': 101},
    solver_type='scipy',            # 'scipy' or 'roadrunner'
    jit=True,                        # Enable JIT compilation (scipy only)
    verbose=True,                    # Show progress bar
    seed=42,                         # Random seed for reproducibility
)

Extended Return Format

For more detailed output, use return_details=True:

result = make_dataset_drug_response(
    n=100,
    cell_model=vc,
    return_details=True,
)

# Access extended data structure
X = result['X']                      # Feature matrix
y = result['y']                      # Target vector
features = result['features']       # Feature dataframe (initial values)
targets = result['targets']         # Target dataframe (outcome values)
timecourse = result['timecourse']   # Timecourse simulation data
metadata = result['metadata']        # Metadata about generation

Perturbation Types

Type Description Parameters
uniform Uniform distribution min, max
gaussian Gaussian (normal) distribution rsd (relative standard deviation)
lognormal Log-normal distribution rsd
lhs Latin Hypercube Sampling rsd

Kinetic Parameter Perturbation

You can also perturb kinetic parameters:

X, y = make_dataset_drug_response(
    n=100,
    cell_model=vc,
    parameter_values={'Km_J0': 100.0},      # Base parameters to perturb
    param_perturbation_type='gaussian',
    param_perturbation_params={'rsd': 0.2},
    param_seed=42,                          # Separate seed for parameters
)

Simulation Robustness

The API includes robust simulation handling with automatic resampling:

X, y = make_dataset_drug_response(
    n=100,
    cell_model=vc,
    resample_size=10,           # Number of alternative samples on failure
    max_retries=3,              # Maximum retries per failed index
    require_all_successful=False, # If False, returns partial results
)

Solver Options

Two solver backends are available:

Solver Description Features
scipy Uses scipy.integrate.odeint with JIT compilation Fast for batch simulations
roadrunner Uses libRoadRunner for SBML simulation Mature, robust
# Scipy solver (default)
X, y = make_dataset_drug_response(
    n=100, cell_model=vc, solver_type='scipy', jit=True
)

# Roadrunner solver
X, y = make_dataset_drug_response(
    n=100, cell_model=vc, solver_type='roadrunner'
)

Example: Complete Workflow

from synthetic import Builder, make_dataset_drug_response
import numpy as np

# 1. Create a virtual cell with custom network topology
vc = Builder.specify(
    degree_cascades=[3, 6, 15],   # 3 degree-1, 6 degree-2, 15 degree-3 cascades
    random_seed=42,
    feedback_density=0.5,
)

# 4. Generate dataset
X, y = make_dataset_drug_response(
    n=1000,
    cell_model=vc,
    target_specie='Oa',
    perturbation_type='gaussian',
    perturbation_params={'rsd': 0.2},
    simulation_params={'start': 0, 'end': 8000, 'points': 101},
    seed=42,
    verbose=True,
)

# 5. Use with sklearn
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = RandomForestRegressor(random_state=42)
model.fit(X_train, y_train)

score = model.score(X_test, y_test)
print(f"Model R^2 score: {score:.3f}")

Network Topology

The degree_cascades parameter defines the hierarchical network structure:

# Simple network: 1 cascade each across 3 degrees
vc = Builder.specify(degree_cascades=[1, 1, 1])
# Creates: R1_1 -> I1_1, R2_1 -> I2_1, R3_1 -> I3_1, with hierarchical connections

# More complex network
vc = Builder.specify(degree_cascades=[3, 6, 15, 25])
# Creates many parallel cascades with increasing complexity per degree

Each cascade creates:

  • A Receptor (R) species at each degree
  • An Intermediate (I) species at each degree
  • The first degree also connects to an Outcome (O) species

Kinetic Parameter Tuning

The library includes KineticParameterTuner for generating biologically plausible kinetic parameters:

from synthetic import KineticParameterTuner

vc = Builder.specify(degree_cascades=[3, 6, 15], random_seed=42)
vc.compile(
    use_kinetic_tuner=True,
    active_percentage_range=(0.3, 0.7),  # Target 30-70% active states
    X_total_multiplier=5.0,               # Km_b = X_total * 5.0
    ki_val=100.0,                        # Constant Ki for inhibitors
    v_max_f_random_range=(5.0, 10.0),    # Forward Vmax range
)

# Access target concentrations
targets = vc.get_target_concentrations()
for species, concentration in targets.items():
    print(f"{species}: {concentration:.2f}")

Advanced Usage

Accessing Underlying Components

vc = Builder.specify(degree_cascades=[1, 2, 5])

# Access the specification
spec = vc.spec
print(spec.species_list)

# Access the model builder
model = vc.model
print(model.get_antimony_model())

# Access the tuner (if used)
tuner = vc.tuner
print(tuner.get_target_concentrations())

Direct Model Building

For more control, you can use the lower-level API:

from synthetic.Specs.DegreeInteractionSpec import DegreeInteractionSpec
from synthetic.Specs.Drug import Drug
from synthetic.utils.kinetic_tuner import KineticParameterTuner

# Create specification
spec = DegreeInteractionSpec(degree_cascades=[1, 2, 5])
spec.generate_specifications(random_seed=42, feedback_density=0.5)

# Add drug
drug = Drug(
    name="D",
    start_time=5000.0,
    default_value=0.0,
    regulation=["R1_1", "R1_2", "R1_3"],
    regulation_type=["down", "down", "down"],
)
spec.add_drug(drug, value=100.0)

# Generate model
model = spec.generate_network(
    network_name="MyNetwork",
    random_seed=42,
)
model.precompile()

# Apply kinetic tuning
tuner = KineticParameterTuner(model, random_seed=42)
updated_params = tuner.generate_parameters(
    active_percentage_range=(0.3, 0.7),
    X_total_multiplier=5.0,
    ki_val=100.0,
)
for param_name, value in updated_params.items():
    model.set_parameter(param_name, value)

Development

Running Tests

pytest

Run specific tests:

pytest tests/test_api.py::TestVirtualCell::test_compile_with_kinetic_tuner

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

synthetic_models-0.1.2.tar.gz (695.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

synthetic_models-0.1.2-py3-none-any.whl (97.9 kB view details)

Uploaded Python 3

File details

Details for the file synthetic_models-0.1.2.tar.gz.

File metadata

  • Download URL: synthetic_models-0.1.2.tar.gz
  • Upload date:
  • Size: 695.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for synthetic_models-0.1.2.tar.gz
Algorithm Hash digest
SHA256 efabc94f35b19373cfd19353b55fa7e708adf402d0e85ce9fc70639710387801
MD5 0a8e579150ac4cbfc3939c1fd6faf2be
BLAKE2b-256 1d2d3022b7d18ba7138dffb2fdc439a22db8ea8e60bfc048e6569e44267f18d3

See more details on using hashes here.

File details

Details for the file synthetic_models-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: synthetic_models-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 97.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for synthetic_models-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d4382e84f586b582329d0309cbf5392600ae58a019dbde9fbd24ffe5ab7cbc01
MD5 663392d59a853ef2b26a3c7b00260e46
BLAKE2b-256 6b055fd38da7db08ab993a75d358d34b1b02873565051ea172221672c63b9e0f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page