Skip to main content

Dynamic MDF synthetic market data generator

Project description

market-wave

market-wave abstract market intent simulation hero

Fast, lightweight synthetic market data from a Dynamic Market Distribution Function.

PyPI Python versions License: MIT Tests

English | 한국어

market-wave is a Python library for generating synthetic market paths from market-wide entry and exit intent. It does not create individual participants. Instead, it models aggregate buy/sell pressure, position exits, order-book depth, cancellations, taker flow, and execution-driven price movement from probability mass over relative ticks.

It is not a forecasting model. It is a lightweight simulation primitive for experiments, visualization, teaching, and strategy-environment prototyping.

Why market-wave?

  • Aggregate intent, not agents: market participants are represented by probability mass over relative ticks, not by individual objects.
  • Dynamic MDF: entry and exit pressure live in four stateful MDF(relative_tick) fields that evolve from the previous step.
  • Pluggable score model: swap the MDF score function with DynamicMDFModel or a custom MDFModel.
  • Separated shape and size: MDFs decide where intent sits; intensity decides how much order flow appears.
  • Execution-driven prices: prices stay flat unless trades execute.
  • Batch generation: generate many reproducible synthetic paths without keeping every path in market.history.
  • Inspectable state: every step returns a StepInfo snapshot with MDFs, volumes, order book state, position mass, VWAP, spread, and imbalance.
  • Built-in plotting: matplotlib is included, with a clean light chart style by default.

Install

pip install market-wave

For dataframe export:

pip install "market-wave[dataframe]"

For local development:

git clone https://github.com/smturtle2/market-wave.git
cd market-wave
uv sync --extra dev

Python >=3.10 is supported.

Quickstart

from market_wave import Market

market = Market(
    initial_price=10_000,
    gap=10,
    popularity=1.0,
    seed=42,
    regime="auto",
    augmentation_strength=0.25,
)
steps = market.step(500)

last = steps[-1]
print(last.price_before, "->", last.price_after)
print("executed:", round(last.total_executed_volume, 3))
print("imbalance:", round(last.order_flow_imbalance, 3))
print("crossed flow:", round(last.crossed_market_volume, 3))
print("residual flow:", round(last.residual_market_buy_volume, 3), round(last.residual_market_sell_volume, 3))

Market.step(n) always returns list[StepInfo] and appends the same objects to market.history.

For high-volume generation, skip in-memory history:

steps = market.step(512, keep_history=False)

for step in market.stream(512, keep_history=False):
    consume(step)

For simple export workflows, use step.to_dict(), step.to_json(), or market.history_records().

Example output with seed=42:

10020.0 -> 10010.0
executed: 1.586
imbalance: 0.059
crossed flow: 1.057
residual flow: 0.343 0.187

Smoke Matrix

The simulator is deterministic for a fixed seed, so it is easy to run the same invariants across different market conditions:

from market_wave import Market

cases = [
    ("baseline", dict(initial_price=10_000, gap=10, popularity=1.0, seed=42, grid_radius=20), 500),
    ("busy", dict(initial_price=10_000, gap=10, popularity=2.5, seed=7, grid_radius=24), 500),
    ("thin", dict(initial_price=500, gap=5, popularity=0.25, seed=123, grid_radius=12), 500),
    ("low_price", dict(initial_price=1, gap=1, popularity=3.0, seed=17, grid_radius=8), 500),
    ("inactive", dict(initial_price=100, gap=1, popularity=0.0, seed=9, grid_radius=10), 100),
]

for name, kwargs, steps_count in cases:
    market = Market(**kwargs)
    steps = market.step(steps_count)
    prices = [step.price_after for step in steps]
    move_steps = sum(step.price_change != 0 for step in steps)
    exec_steps = sum(step.total_executed_volume > 0 for step in steps)
    print(name, min(prices), max(prices), move_steps, exec_steps, market.state.price)

Recent verification on the current implementation:

baseline   range= 10000.0- 10010.0 moves=228 exec_steps=500 final= 10000.0
busy       range=  9940.0- 10010.0 moves=214 exec_steps=500 final=  9940.0
thin       range=   495.0-   505.0 moves=229 exec_steps=500 final=   505.0
low_price  range=     1.0-     3.0 moves=223 exec_steps=500 final=     2.0
inactive   range=   100.0-   100.0 moves=  0 exec_steps=  0 final=   100.0

Those runs also checked that current-state MDF projections stay aligned with state.price_grid, MDFs remain normalized, prices never fall below one tick, order book and position mass stay non-negative, and price changes only occur on steps with executed volume. Dynamic MDF acceptance also runs seeds 10..19 at mdf_temperature=1.0 and checks that every MDF remains finite, non-negative, normalized, and broad enough not to collapse to a single price.

Diagnostic note for 0.2.4: the current MDF update is numerically stable under the smoke metrics above, but it is not behaviorally calibrated. Treat these ranges, move counts, and execution counts as regression diagnostics, not claims that the generated paths match any real market.

Performance note for 0.2.4: live order-book and position totals are cached by price/side while preserving individual lot and cohort age semantics. Long runs avoid repeatedly summing all live lots for best-price lookup, snapshots, and near-touch imbalance, regardless of keep_history.

Visualization

from market_wave import Market

market = Market(initial_price=10_000, gap=10, popularity=1.0, seed=42)
market.step(260)

fig, ax = market.plot(last=180)

market-wave light pyplot chart showing price, orderbook depth heatmaps, volume, and imbalance

The default market_wave style uses a light multi-panel chart: price/VWAP, bid and ask orderbook depth heatmaps by simple level, executed volume, and order-flow imbalance. To keep the legacy three-panel view, pass orderbook=False.

Dark overlay mode is still available:

fig, ax = market.plot(layout="overlay", style="market_wave_dark")

Synthetic Data

from market_wave import compute_metrics, generate_paths

paths = generate_paths(
    n_paths=100,
    horizon=512,
    config_sampler=lambda path_id: {
        "initial_price": 10_000,
        "gap": 10,
        "popularity": 1.0,
        "seed": 10_000 + path_id,
        "regime": "auto",
        "augmentation_strength": 0.35,
    },
)

metrics = compute_metrics(paths)
print(metrics.return_std, metrics.volume_mean, metrics.max_drawdown)
print(paths[0].metadata.config_hash)

GeneratedPath.metadata stores seed, config_hash, package version, regime, and augmentation_strength so synthetic runs can be traced. Pandas is optional: install market-wave[dataframe] to use to_dataframe(). ValidationMetrics.volatility_clustering_score is computed within each generated path and aggregated, so independent path boundaries do not affect the diagnostic.

Pluggable MDF

from market_wave import Market

class CenterSeekingMDF:
    def scores(self, side, intent, relative_ticks, context, signals=None):
        del side, intent, context, signals
        return [-abs(tick) for tick in relative_ticks]

market = Market(initial_price=100, gap=1, mdf_model=CenterSeekingMDF(), seed=7)

step = market.step(1)[0]
print(step.relative_ticks)
print(step.buy_entry_mdf)

Custom MDF models return scores, not probabilities. Treat each score as log-growth evidence: additive score differences become multiplicative changes to the previous MDF. Market applies those scores through the stabilized MDF update described below.

Core Concepts

At every step, the market builds relative ticks around the current price:

relative_tick = (price - current_price) / tick_size
relative_ticks = [-grid_radius, ..., 0, ..., +grid_radius]

The simulator maintains four Market Distribution Functions on that relative grid:

  • buy_entry_mdf
  • sell_entry_mdf
  • long_exit_mdf
  • short_exit_mdf

Each MDF is normalized. It is not recreated from scratch each step; it evolves from the previous MDF:

logits = persistence * log(MDF_prev(tick) + eps)
       + score(tick) / effective_temperature
proposal = softmax(clamp(logits - max(logits), -50, 0))
MDF_next = Normalize((1 - floor_mix) * Diffuse(proposal) + floor_mix * Uniform)

score(tick) can include value, trend, liquidity attraction, memory, risk, and order-book pressure. mdf_temperature controls how sharply scores reshape the distribution. The effective temperature also includes current volatility, so high-volatility regimes soften score updates instead of letting one tick absorb all mass. Persistence, diffusion, and uniform floor mixing prevent repeated small score advantages from collapsing the MDF into a single tick.

Those relative MDFs are projected onto the pre-trade grid price_grid = price_before +/- k * gap for order-book formation. StepInfo.mdf_price_basis records that pre-trade price basis.

low temperature  -> sharper, concentrated MDF
high temperature -> wider, smoother MDF

MDFs generate aggregate intent. Intensity controls total size. The order book and execution layer then turn that intent into limit flow, taker flow, cancellations, exits, matched volume, and price changes.

Execution Guarantee

Price movement is execution-driven:

  • If a step has no executed volume, price_after == price_before.
  • If trades execute, price_after is derived from that step's execution statistics. Random quote jitter is bounded and cannot move the price by itself when executions print at the previous price.
  • seed makes the simulation reproducible for the same version and inputs.

This is a simulator, not a market data replay engine and not financial advice.

API Overview

from market_wave import (
    Market,
    DynamicMDFModel,
    generate_paths,
    compute_metrics,
    MarketState,
    IntensityState,
    LatentState,
    MDFContext,
    MDFSignals,
    MDFModel,
    RelativeMDFComponent,
    MDFState,
    OrderBookState,
    PositionMassState,
    StepInfo,
)

Useful StepInfo fields include:

  • price_before, price_after, price_change
  • tick_before, tick_after, tick_change, relative_ticks
  • mdf_price_basis, price_grid
  • buy_entry_mdf, sell_entry_mdf, long_exit_mdf, short_exit_mdf
  • buy_entry_mdf_by_price, sell_entry_mdf_by_price
  • buy_volume_by_price, sell_volume_by_price
  • executed_volume_by_price, total_executed_volume, trade_count
  • market_buy_volume, market_sell_volume, crossed_market_volume
  • residual_market_buy_volume, residual_market_sell_volume
  • vwap_price, best_bid_before, best_ask_before, spread_after
  • orderbook_before, orderbook_after
  • position_mass_before, position_mass_after

The *_mdf_by_price fields are pre-trade MDF projections keyed by mdf_price_basis; current Market.state.mdf.*_by_price is reprojected to the post-trade state price. Examples and public APIs use MDF names only; stale PMF examples from earlier prototypes should be considered obsolete.

Public Contract and Snapshot Policy

The public import surface is the package __all__: Market, generate_paths, compute_metrics, generated-path metadata, MDF model/protocol types, metrics, and the state dataclasses shown above. The entrypoints are intentionally small, but the observation contract is broad because StepInfo and MarketState expose detailed simulator diagnostics.

During the current alpha line, existing public names and existing StepInfo / state fields are kept compatible where practical. New diagnostic fields may be added in alpha releases. MDF names are the supported public distribution names; stale PMF names from earlier prototypes are obsolete.

Snapshot mutability: state dataclasses are frozen=True at the attribute level, but nested dict and list fields are plain mutable containers so to_dict() and JSON export remain simple. Treat Market.state, StepInfo, and GeneratedPath.hidden_states as read-only observations. Use Market.snapshot() when downstream code needs a mutation-safe deep copy of the current state.

Compatibility note: Market.state remains available as the live current-state attribute for the alpha line. Future releases may add a more explicit read-model API or deprecation path for code that mutates state containers in place.

Development

uv sync --extra dev --extra dataframe
uv run ruff check .
uv run pytest
uv build

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

market_wave-0.2.4.tar.gz (2.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

market_wave-0.2.4-py3-none-any.whl (29.6 kB view details)

Uploaded Python 3

File details

Details for the file market_wave-0.2.4.tar.gz.

File metadata

  • Download URL: market_wave-0.2.4.tar.gz
  • Upload date:
  • Size: 2.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for market_wave-0.2.4.tar.gz
Algorithm Hash digest
SHA256 4782e439d58c732718f9a664272b6b8e089024b8e9ad8a262e85fa28d28bbc70
MD5 3f0aab5a481bd56e70b5fe1dfa9f6a18
BLAKE2b-256 612f2251d9cbfd1381e89c85f380b1fa0070d5d90f49fb7cfae2dff855b43ea9

See more details on using hashes here.

Provenance

The following attestation bundles were made for market_wave-0.2.4.tar.gz:

Publisher: workflow.yml on smturtle2/market-wave

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file market_wave-0.2.4-py3-none-any.whl.

File metadata

  • Download URL: market_wave-0.2.4-py3-none-any.whl
  • Upload date:
  • Size: 29.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for market_wave-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 73aee78fd7be36387c3a8a2f23d228f32f211838da99345055f83323b80cbe0a
MD5 308b4803eaff71d2dba420bb419c5556
BLAKE2b-256 7021133fbc63b345fbde8a22764f006b86d9b4772104a80d6949c3382894d989

See more details on using hashes here.

Provenance

The following attestation bundles were made for market_wave-0.2.4-py3-none-any.whl:

Publisher: workflow.yml on smturtle2/market-wave

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page