synthetic-models is a virtual cell generation library for generating biological synthetic data that can be used for benchmarking predictive modelling workflows. The synthetic data are generated using ODE models formulated in biochemical laws common in cancer cell signalling networks. Synthetic can create datasets in the format of scikit-learn's make_regression method.
Project description
Synthetic
Synthetic (aka ModelBuilder) is a virtual cell generation library for generating biological synthetic data that can be used for benchmarking predictive modelling workflows. The synthetic data are generated using ODE models formulated in biochemical laws common in cancer cell signalling networks. Synthetic can create datasets in the format of scikit-learn's make_regression method.
Installation
pip install synthetic-models
For development:
git clone https://github.com/AnEvilBurrito/Synthetic.git
cd Synthetic
uv sync
Alpha Testing
For lab members participating in alpha testing, install directly from the lab's GitHub repository:
pip install git+https://github.com/IntegratedNetworkModellingLab/Synthetic.git
To install a specific version or commit:
pip install git+https://github.com/IntegratedNetworkModellingLab/Synthetic.git@v0.1.0
# or
pip install git+https://github.com/IntegratedNetworkModellingLab/Synthetic.git@<commit-hash>
For lab members with write access, you can use SSH:
pip install git+ssh://git@github.com/IntegratedNetworkModellingLab/Synthetic.git
Quick Start
The high-level API provides a simple interface for creating virtual cell models and generating sklearn-compatible datasets:
from synthetic import Builder, make_dataset_drug_response
# Create a virtual cell model (auto-compiles with default settings)
vc = Builder.specify(degree_cascades=[1, 2, 5], random_seed=42)
# Generate a sklearn-compatible dataset
X, y = make_dataset_drug_response(n=1000, cell_model=vc, target_specie='Oa')
print(f"Feature matrix shape: {X.shape}") # (1000, n_features)
print(f"Target vector shape: {y.shape}") # (1000,)
Examples
The examples/ directory contains practical examples demonstrating various features of Synthetic:
create_dataset.py
Basic example showing how to generate a synthetic drug response dataset and analyze feature correlations.
python examples/create_dataset.py
Demonstrates:
- Creating a virtual cell model with custom network topology
- Generating sklearn-compatible datasets
- Calculating Pearson correlations between features and targets
- Identifying predictive features in the network
export_model_sbml.py
Shows how to export synthetic models to standard SBML and Antimony formats for interoperability with other tools.
python examples/export_model_sbml.py
Demonstrates:
- Creating a virtual cell model
- Accessing the underlying ModelBuilder
- Exporting models to SBML format (for COPASI, libRoadRunner, etc.)
- Exporting models to Antimony format (human-readable)
- Retrieving model strings for programmatic use
detailed_examples.md
Comprehensive guide with detailed workflows and advanced use cases, including:
- Feature correlation analysis - Complete workflow with CSV export
- Combination therapy - Testing multi-drug interactions
- Step-by-step explanations of each workflow
View the guide:
cat examples/detailed_examples.md
or open it in your preferred markdown viewer.
API Overview
VirtualCell
The VirtualCell class represents a virtual cell model with a hierarchical network topology.
from synthetic import VirtualCell
# Create a virtual cell without auto-compiling
vc = VirtualCell(
degree_cascades=[1, 2, 5], # Network topology: 1 cascade degree 1, 2 cascades degree 2, 5 cascades degree 3
name="MyCell",
random_seed=42,
feedback_density=0.5, # 50% of possible feedback connections
auto_compile=False, # Disable auto-compile for custom configuration
auto_drug=True, # Auto-generate a drug targeting degree 1 species
drug_name="D",
drug_start_time=5000.0, # Drug becomes active at t=5000
drug_value=100.0, # Drug active concentration
drug_regulation_type="down", # Drug inhibits targets
simulation_end=10000.0, # Default simulation end time
)
# Compile the model (required before simulation)
vc.compile(
mean_range_species=(50, 150), # Initial concentration range
rangeScale_params=(0.8, 1.2), # Parameter scaling range
use_kinetic_tuner=True, # Use KineticParameterTuner for biologically plausible parameters
active_percentage_range=(0.3, 0.7), # Target 30-70% active species
)
Methods
| Method | Description |
|---|---|
add_drug(...) |
Add a drug to the model |
list_drugs() |
List all drugs in the system |
compile(...) |
Compile the model (lazy initialization) |
get_species_names(exclude_drugs=True) |
Get list of species names |
get_initial_values(exclude_drugs=True) |
Get initial species values |
get_target_concentrations() |
Get target active concentrations from kinetic tuner |
Properties
| Property | Description |
|---|---|
spec |
Underlying DegreeInteractionSpec object |
model |
Underlying ModelBuilder object |
tuner |
Underlying KineticParameterTuner object (if used) |
Builder
The Builder class provides a factory method for creating VirtualCell instances with a clean, fluent API.
from synthetic import Builder
# Basic usage (auto-compiles)
vc = Builder.specify(degree_cascades=[1, 2, 5])
# With custom parameters
vc = Builder.specify(
degree_cascades=[3, 6, 15, 25],
name="ComplexCell",
random_seed=42,
feedback_density=0.5,
auto_compile=True,
auto_drug=True,
drug_name="Inhibitor",
drug_start_time=5000.0,
drug_value=100.0,
drug_regulation_type="down",
simulation_end=10000.0,
)
Adding Drugs
Drugs can be added manually to target degree 1 R species:
# Create a virtual cell without auto-drug
vc = Builder.specify(degree_cascades=[1, 2, 5], auto_drug=False)
# Add a drug manually (returns self for chaining)
vc.add_drug(
name="DrugX",
start_time=500.0, # Time at which drug becomes active
default_value=0.0, # Default value when not active
regulation=["R1_1", "R1_2"], # Target species (must be degree 1 R species)
regulation_type=["down", "down"], # Regulation types: "up" or "down"
value=100.0, # Optional override for active concentration
)
# Compile to apply drugs
vc.compile()
List all drugs in the system:
drugs = vc.list_drugs()
for drug in drugs:
print(f"Drug: {drug['name']}, Targets: {drug['targets']}, Types: {drug['types']}")
make_dataset_drug_response
Generate synthetic drug response datasets compatible with scikit-learn's make_regression format.
from synthetic import Builder, make_dataset_drug_response
vc = Builder.specify(degree_cascades=[1, 2, 5], random_seed=42)
# Basic usage
X, y = make_dataset_drug_response(
n=1000, # Number of samples
cell_model=vc, # Compiled VirtualCell instance
target_specie='Oa', # Outcome species to use as target
)
# With custom parameters
X, y = make_dataset_drug_response(
n=500,
cell_model=vc,
target_specie='Oa',
perturbation_type='gaussian', # 'uniform', 'gaussian', 'lognormal', 'lhs'
perturbation_params={'rsd': 0.2}, # Relative standard deviation
simulation_params={'start': 0, 'end': 10000, 'points': 101},
solver_type='scipy', # 'scipy' or 'roadrunner'
jit=True, # Enable JIT compilation (scipy only)
verbose=True, # Show progress bar
seed=42, # Random seed for reproducibility
)
Extended Return Format
For more detailed output, use return_details=True:
result = make_dataset_drug_response(
n=100,
cell_model=vc,
return_details=True,
)
# Access extended data structure
X = result['X'] # Feature matrix
y = result['y'] # Target vector
features = result['features'] # Feature dataframe (initial values)
targets = result['targets'] # Target dataframe (outcome values)
timecourse = result['timecourse'] # Timecourse simulation data
metadata = result['metadata'] # Metadata about generation
Perturbation Types
| Type | Description | Parameters |
|---|---|---|
uniform |
Uniform distribution | min, max |
gaussian |
Gaussian (normal) distribution | rsd (relative standard deviation) |
lognormal |
Log-normal distribution | rsd |
lhs |
Latin Hypercube Sampling | rsd |
Kinetic Parameter Perturbation
You can also perturb kinetic parameters:
X, y = make_dataset_drug_response(
n=100,
cell_model=vc,
parameter_values={'Km_J0': 100.0}, # Base parameters to perturb
param_perturbation_type='gaussian',
param_perturbation_params={'rsd': 0.2},
param_seed=42, # Separate seed for parameters
)
Simulation Robustness
The API includes robust simulation handling with automatic resampling:
X, y = make_dataset_drug_response(
n=100,
cell_model=vc,
resample_size=10, # Number of alternative samples on failure
max_retries=3, # Maximum retries per failed index
require_all_successful=False, # If False, returns partial results
)
Solver Options
Two solver backends are available:
| Solver | Description | Features |
|---|---|---|
scipy |
Uses scipy.integrate.odeint with JIT compilation |
Fast for batch simulations |
roadrunner |
Uses libRoadRunner for SBML simulation | Mature, robust |
# Scipy solver (default)
X, y = make_dataset_drug_response(
n=100, cell_model=vc, solver_type='scipy', jit=True
)
# Roadrunner solver
X, y = make_dataset_drug_response(
n=100, cell_model=vc, solver_type='roadrunner'
)
Example: Complete Workflow
from synthetic import Builder, make_dataset_drug_response
import numpy as np
# 1. Create a virtual cell with custom network topology
vc = Builder.specify(
degree_cascades=[3, 6, 15], # 3 degree-1, 6 degree-2, 15 degree-3 cascades
random_seed=42,
feedback_density=0.5,
)
# 4. Generate dataset
X, y = make_dataset_drug_response(
n=1000,
cell_model=vc,
target_specie='Oa',
perturbation_type='gaussian',
perturbation_params={'rsd': 0.2},
simulation_params={'start': 0, 'end': 8000, 'points': 101},
seed=42,
verbose=True,
)
# 5. Use with sklearn
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = RandomForestRegressor(random_state=42)
model.fit(X_train, y_train)
score = model.score(X_test, y_test)
print(f"Model R^2 score: {score:.3f}")
Network Topology
The degree_cascades parameter defines the hierarchical network structure:
# Simple network: 1 cascade each across 3 degrees
vc = Builder.specify(degree_cascades=[1, 1, 1])
# Creates: R1_1 -> I1_1, R2_1 -> I2_1, R3_1 -> I3_1, with hierarchical connections
# More complex network
vc = Builder.specify(degree_cascades=[3, 6, 15, 25])
# Creates many parallel cascades with increasing complexity per degree
Each cascade creates:
- A Receptor (R) species at each degree
- An Intermediate (I) species at each degree
- The first degree also connects to an Outcome (O) species
Kinetic Parameter Tuning
The library includes KineticParameterTuner for generating biologically plausible kinetic parameters:
from synthetic import KineticParameterTuner
vc = Builder.specify(degree_cascades=[3, 6, 15], random_seed=42)
vc.compile(
use_kinetic_tuner=True,
active_percentage_range=(0.3, 0.7), # Target 30-70% active states
X_total_multiplier=5.0, # Km_b = X_total * 5.0
ki_val=100.0, # Constant Ki for inhibitors
v_max_f_random_range=(5.0, 10.0), # Forward Vmax range
)
# Access target concentrations
targets = vc.get_target_concentrations()
for species, concentration in targets.items():
print(f"{species}: {concentration:.2f}")
Advanced Usage
Accessing Underlying Components
vc = Builder.specify(degree_cascades=[1, 2, 5])
# Access the specification
spec = vc.spec
print(spec.species_list)
# Access the model builder
model = vc.model
print(model.get_antimony_model())
# Access the tuner (if used)
tuner = vc.tuner
print(tuner.get_target_concentrations())
Direct Model Building
For more control, you can use the lower-level API:
from synthetic.Specs.DegreeInteractionSpec import DegreeInteractionSpec
from synthetic.Specs.Drug import Drug
from synthetic.utils.kinetic_tuner import KineticParameterTuner
# Create specification
spec = DegreeInteractionSpec(degree_cascades=[1, 2, 5])
spec.generate_specifications(random_seed=42, feedback_density=0.5)
# Add drug
drug = Drug(
name="D",
start_time=5000.0,
default_value=0.0,
regulation=["R1_1", "R1_2", "R1_3"],
regulation_type=["down", "down", "down"],
)
spec.add_drug(drug, value=100.0)
# Generate model
model = spec.generate_network(
network_name="MyNetwork",
random_seed=42,
)
model.precompile()
# Apply kinetic tuning
tuner = KineticParameterTuner(model, random_seed=42)
updated_params = tuner.generate_parameters(
active_percentage_range=(0.3, 0.7),
X_total_multiplier=5.0,
ki_val=100.0,
)
for param_name, value in updated_params.items():
model.set_parameter(param_name, value)
Development
Running Tests
pytest
Run specific tests:
pytest tests/test_api.py::TestVirtualCell::test_compile_with_kinetic_tuner
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file synthetic_models-0.1.2.tar.gz.
File metadata
- Download URL: synthetic_models-0.1.2.tar.gz
- Upload date:
- Size: 695.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
efabc94f35b19373cfd19353b55fa7e708adf402d0e85ce9fc70639710387801
|
|
| MD5 |
0a8e579150ac4cbfc3939c1fd6faf2be
|
|
| BLAKE2b-256 |
1d2d3022b7d18ba7138dffb2fdc439a22db8ea8e60bfc048e6569e44267f18d3
|
File details
Details for the file synthetic_models-0.1.2-py3-none-any.whl.
File metadata
- Download URL: synthetic_models-0.1.2-py3-none-any.whl
- Upload date:
- Size: 97.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d4382e84f586b582329d0309cbf5392600ae58a019dbde9fbd24ffe5ab7cbc01
|
|
| MD5 |
663392d59a853ef2b26a3c7b00260e46
|
|
| BLAKE2b-256 |
6b055fd38da7db08ab993a75d358d34b1b02873565051ea172221672c63b9e0f
|