Skip to main content

Markov-modulated ODE simulator for synthetic single-cell splicing datasets.

Project description

markovmodus

CI PyPI License: MIT

Markov-modulated splicing simulator for single-cell U/S counts.

Generate snapshot datasets where a hidden state graph (the support of a continuous-time generator) modulates unspliced->spliced RNA dynamics.

Why this exists

Single-cell RNA sequencing captures each cell only once, so trajectory and velocity methods must infer temporal structure from static snapshots. Without datasets where the true lineage graph is known, it is hard to validate the assumptions those methods make.

markovmodus fills that gap by generating synthetic unspliced/spliced counts with an explicit hidden-state lineage. Cells hop between phenotypic states according to a continuous-time Markov process, and each state drives its own transcriptional kinetics.

For biologists, think of the hidden states as cellular programs—progenitors, intermediates, terminal fates—with transition rates describing how readily a cell exits one program and commits to another. Within each program, genes produce pre-mRNA that is spliced and degraded, yielding both nascent and mature counts like those used in RNA velocity analyses. Because the simulator records the exact state graph, you can stress-test algorithms that aim to recover branching, cyclic, or linear progressions from single snapshots.

Model

Latent dynamics (state process)

  • States indexed z = 1, ..., n with an adjacency mask M (an n-by-n binary matrix) describing the undirected support.
  • Generator Q on this support: Q[i, j] > 0 iff M[i, j] = 1, and each row sums to zero via Q[i, i] = -sum_{j != i} Q[i, j].
  • Directionality arises from asymmetric off-diagonal rates (Q[i, j] != Q[j, i]).

Emissions (per gene, in state z)

  • Linear splicing dynamics with state-specific transcription targets (via the steady-state profile) and global beta (splicing) / gamma (decay):
    • dU/dt = alpha_z - beta * U
    • dS/dt = beta * U - gamma * S
  • Snapshots at time t* can be perturbed by negative-binomial noise (enable by setting SimulationParameters.dispersion) so counts remain discrete and overdispersed.

Topology encoding via gene sets

  • Each state i receives markers_per_state markers. Genes are either unique to one state (reuse cap = 1) or shared by exactly two states (reuse cap = 2).
  • When sharing is enabled a balanced random sampler selects how many genes each pair of states shares so overlaps stay comparable while never exceeding two states per gene.
  • This yields overlap-induced continuity in gene space without introducing higher-order simplices.

Transition Graph Configuration

  • Default behaviour uses a fully connected graph with a uniform jump rate set via SimulationParameters.default_transition_rate (falls back to 0.05 if omitted).
  • Provide explicit static transition_rates (shape n-by-n, zero diagonal) for arbitrary directed rates; the simulator samples next states proportional to the row's off-diagonal rates.
  • Example:
    custom = np.full((n, n), 0.05, dtype=float)
    np.fill_diagonal(custom, 0.0)
    params = SimulationParameters(..., transition_rates=custom)
    
  • transition_matrix remains available as a deprecated alias for transition_rates.
  • For time- or state-dependent rates, pass a callable that receives a SimulationState:
    import numpy as np
    from markovmodus import SimulationParameters, SimulationState
    
    def rates(state: SimulationState) -> np.ndarray:
        custom = np.zeros((3, 3), dtype=float)
        custom[0, 1] = 0.02 if state.time < 10.0 else 0.2
    
        counts = np.bincount(state.cell_states, minlength=3)
        custom[1, 2] = 0.01 + 0.001 * counts[1]
        return custom
    
    params = SimulationParameters(..., num_states=3, transition_rates=rates)
    
  • The latent-state process uses a fixed-dt discretized CTMC approximation rather than exact Gillespie simulation. Dynamic rates are evaluated at the start of each time step and held constant for that step.

Getting Started

  • Install from PyPI:
    pip install markovmodus
    
  • Or, after cloning this repository, install locally:
    pip install .
    
  • Define your simulation settings and run the generator:
    from markovmodus import SimulationParameters, simulate_dataset
    
    params = SimulationParameters(
        num_states=5,
        num_genes=300,
        num_cells=2000,
        t_final=30.0,
        dt=1.0,
        markers_per_state=120,
        default_transition_rate=0.08,
        rng_seed=42,
    )
    
    adata = simulate_dataset(params)  # AnnData with spliced/unspliced layers
    
  • Produce a pandas DataFrame (and optionally persist to CSV):
    df = simulate_dataset(params, output="dataframe", save_path="counts.csv")
    
  • Write an AnnData file for Scanpy workflows:
    simulate_dataset(params, save_path="snapshot.h5ad", file_format="h5ad")
    
  • Request both views when integrating with pipelines:
    adata, df = simulate_dataset(params, output="both")
    

Documentation

Start with the Introduction for a primer on the biological motivation and simulator design, then dive into the usage guide and API reference.

Example notebooks

Interactive walkthroughs live in notebooks/. Open them locally in Jupyter or your favourite notebook environment to explore model configuration and downstream analysis patterns. The time_dependent_transition_rates.ipynb notebook demonstrates dynamic transition rates driven by SimulationState.time and the current latent cell-state distribution.

License

MIT licensed. See CITATION.cff for citation details.

Project resources

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

markovmodus-0.2.0.tar.gz (19.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

markovmodus-0.2.0-py3-none-any.whl (15.4 kB view details)

Uploaded Python 3

File details

Details for the file markovmodus-0.2.0.tar.gz.

File metadata

  • Download URL: markovmodus-0.2.0.tar.gz
  • Upload date:
  • Size: 19.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for markovmodus-0.2.0.tar.gz
Algorithm Hash digest
SHA256 fd889369265e1dfe2421423ec5c16148183a67b4a0bb52cc6e4c28419ee3fe37
MD5 5ee5074c9d8d1d4bf4d52644b96e3a91
BLAKE2b-256 5bd58e75138ce568570f043f6eca0ed00311e06983a63133b3fe74741adbc6a6

See more details on using hashes here.

File details

Details for the file markovmodus-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: markovmodus-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 15.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for markovmodus-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c4a7bf50fb350e1421243fcaa13da112da1de2dcf43b0fa8b0847fbc17e50492
MD5 3ae69f20e06f20c6f8df099035c5ffe3
BLAKE2b-256 c6bd51f77db20833161e93c067263e5d72871a7f23f22fc45480b524f57f497b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page