Skip to main content

Dynamic MDF synthetic market data generator

Project description

market-wave

market-wave abstract market intent simulation hero

Fast, lightweight synthetic market data from a Dynamic Market Distribution Function.

PyPI Python versions License: MIT Tests

English | 한국어

market-wave is a Python library for generating synthetic market paths from market-wide entry and exit intent. It does not create individual participants. Instead, it models aggregate buy/sell entry intent, position exits, order-book depth, cancellations, taker flow, and execution-driven price movement from probability mass over relative ticks.

It is not a forecasting model. It is a lightweight simulation primitive for experiments, visualization, teaching, and strategy-environment prototyping.

Why market-wave?

  • Aggregate intent, not agents: market participants are represented by probability mass over relative ticks, not by individual objects.
  • Dynamic MDF: entry and exit intent live in four stateful MDF(relative_tick) fields that evolve from the previous step.
  • Pluggable score model: swap the MDF score function with DynamicMDFModel or a custom MDFModel.
  • Separated shape and size: MDFs decide where intent sits; intensity decides how much order flow appears.
  • Execution-driven prices: prices stay flat unless trades execute.
  • Batch generation: generate many reproducible synthetic paths without keeping every path in market.history.
  • Inspectable state: every step returns a StepInfo snapshot with MDFs, volumes, order book state, position mass, VWAP, spread, and imbalance.
  • Built-in plotting: matplotlib is included, with a clean light chart style by default.

Install

pip install market-wave

For dataframe export:

pip install "market-wave[dataframe]"

For local development:

git clone https://github.com/smturtle2/market-wave.git
cd market-wave
uv sync --extra dev

Python >=3.10 is supported.

Quickstart

from market_wave import Market

market = Market(
    initial_price=10_000,
    gap=10,
    popularity=1.0,
    seed=42,
)
steps = market.step(500)

last = steps[-1]
print(last.price_before, "->", last.price_after)
print("entry:", round(sum(last.entry_volume_by_price.values()), 3))
print("executed:", round(last.total_executed_volume, 3))
print("resting bid/ask:", round(sum(last.orderbook_after.bid_volume_by_price.values()), 3), round(sum(last.orderbook_after.ask_volume_by_price.values()), 3))
print("imbalance:", round(last.order_flow_imbalance, 3))

Market.step(n) always returns list[StepInfo] and appends the same objects to market.history.

For high-volume generation, skip in-memory history:

steps = market.step(512, keep_history=False)

for step in market.stream(512, keep_history=False):
    consume(step)

For simple export workflows, use step.to_dict(), step.to_json(), or market.history_records().

Example output with seed=42:

10020.0 -> 10010.0
entry: 2.955
executed: 0.976
resting bid/ask: 24.662 25.83
imbalance: 0.484

Smoke Matrix

The simulator is deterministic for a fixed seed, so it is easy to run the same invariants across different market conditions:

from market_wave import Market

cases = [
    ("baseline", dict(initial_price=10_000, gap=10, popularity=1.0, seed=42), 500),
    ("busy", dict(initial_price=10_000, gap=10, popularity=2.5, seed=7), 500),
    ("thin", dict(initial_price=500, gap=5, popularity=0.25, seed=123), 500),
    ("low_price", dict(initial_price=1, gap=1, popularity=3.0, seed=17), 500),
    ("trend_up", dict(initial_price=10_000, gap=10, popularity=1.0, seed=42, regime="trend_up"), 500),
    ("high_vol", dict(initial_price=10_000, gap=10, popularity=1.0, seed=7, regime="high_vol"), 500),
    ("inactive", dict(initial_price=100, gap=1, popularity=0.0, seed=9), 100),
]

for name, kwargs, steps_count in cases:
    market = Market(**kwargs)
    steps = market.step(steps_count)
    prices = [step.price_after for step in steps]
    move_steps = sum(step.price_change != 0 for step in steps)
    exec_steps = sum(step.total_executed_volume > 0 for step in steps)
    print(name, min(prices), max(prices), move_steps, exec_steps, market.state.price)

Recent verification on the current implementation:

baseline  range=  9900.0- 10080.0 unique= 19 moves=369 exec_steps=500 final= 10010.0
busy      range=  9890.0- 10030.0 unique= 15 moves=388 exec_steps=500 final=  9910.0
thin      range=   455.0-   535.0 unique= 17 moves=318 exec_steps=500 final=   500.0
low_price range=     2.0-    24.0 unique= 23 moves=380 exec_steps=500 final=    19.0
trend_up  range= 10000.0- 10320.0 unique= 33 moves=388 exec_steps=500 final= 10320.0
high_vol  range=  9960.0- 10040.0 unique=  9 moves=405 exec_steps=500 final=  9970.0
inactive  range=   100.0-   100.0 unique=  1 moves=  0 exec_steps=  0 final=   100.0

Those runs also checked that current-state MDF projections stay aligned with state.price_grid, MDFs remain normalized, prices never fall below one tick, order book and position mass stay non-negative, and price changes only occur on steps with executed volume. Dynamic MDF acceptance also runs seeds 10..19 at mdf_temperature=1.0 and checks that every MDF remains finite, non-negative, normalized, and broad enough not to collapse to a single price.

Diagnostic note for 0.4.1: the simulator still has no anchor price or stored target that pulls paths back to the initial price. Seeded mood, trend, volatility, microstructure activity, cancellation pressure, and event pressure evolve each step and reshape the MDFs and visible book. Prices remain execution-driven, with a small flow-implied price-discovery component when executed flow reveals one-sided pressure. Treat these ranges, move counts, and execution counts as regression diagnostics, not claims that generated paths match any specific real market.

Entry MDF prices are treated as incoming order prices. Buy entry orders arrive as bids, sell entry orders arrive as asks, and they execute only when they overlap existing opposite-side quotes. Executions print at the resting quote price. Unfilled volume remains in the book at the sampled MDF price. Exit flow is cohort-conditioned, so exit orders carry the originating cohort id and still route through visible order-book liquidity.

MDF note for 0.4.1: the default entry MDF now uses a side-relative reservation-price mixture. Buy entry intent is spread across deep value, passive bid, arrival, and small chase zones; sell entry intent mirrors that on the ask side. This keeps passive limit interest in the MDF itself instead of adding synthetic fixed walls, while reducing excessive buy-ask and sell-bid tail mass.

Microstructure note for 0.4.0: order-book replenishment now includes regime-specific depth shape, resiliency, wall memory by absolute tick, event-driven volume bursts, dry-up after cancellation pressure, trend exhaustion, and squeeze pressure derived from short crowding plus recent one-sided flow. Live order-book and position totals remain cached by price/side, lots are coalesced by price/kind, and position inventory is kept in bounded entry-price cohort buckets.

Visualization

from market_wave import Market

market = Market(
    initial_price=10_000,
    gap=10,
    popularity=1.0,
    seed=42,
)
market.step(500)

fig, ax = market.plot(last=220, orderbook_depth=12)

market-wave light pyplot chart showing price, orderbook depth heatmaps, volume, and imbalance

The default market_wave style uses a light multi-panel chart: price/VWAP, bid and ask orderbook depth heatmaps by simple level, executed volume, and order-flow imbalance. To keep the legacy three-panel view, pass orderbook=False.

Dark overlay mode is still available:

fig, ax = market.plot(layout="overlay", style="market_wave_dark")

Synthetic Data

from market_wave import compute_metrics, generate_paths

paths = generate_paths(
    n_paths=100,
    horizon=512,
    config_sampler=lambda path_id: {
        "initial_price": 10_000,
        "gap": 10,
        "popularity": 1.0,
        "seed": 10_000 + path_id,
    },
)

metrics = compute_metrics(paths)
print(metrics.return_std, metrics.volume_mean, metrics.max_drawdown)
print(paths[0].metadata.config_hash)

GeneratedPath.metadata stores seed, config_hash, package version, regime, and augmentation_strength so synthetic runs can be traced. Pandas is optional: install market-wave[dataframe] to use to_dataframe(). ValidationMetrics.volatility_clustering_score is computed within each generated path and aggregated, so independent path boundaries do not affect the diagnostic.

Pluggable MDF

from market_wave import Market

class CenterSeekingMDF:
    def scores(self, side, intent, relative_ticks, context, signals=None):
        del side, intent, context, signals
        return [-abs(tick) for tick in relative_ticks]

market = Market(initial_price=100, gap=1, mdf_model=CenterSeekingMDF(), seed=7)

step = market.step(1)[0]
print(step.relative_ticks)
print(step.buy_entry_mdf)

Custom MDF models return scores, not probabilities. Treat each score as log-growth evidence: additive score differences become multiplicative changes to the previous MDF. Market applies those scores through the stabilized MDF update described below.

Core Concepts

At every step, the market builds relative ticks around the current price:

relative_tick = (price - current_price) / tick_size
relative_ticks = [-grid_radius, ..., 0, ..., +grid_radius]

The simulator maintains four Market Distribution Functions on that relative grid:

  • buy_entry_mdf
  • sell_entry_mdf
  • long_exit_mdf
  • short_exit_mdf

Each MDF is normalized. It is not recreated from scratch each step; it evolves from the previous MDF:

logits = persistence * log(MDF_prev(tick) + eps)
       + score(tick) / effective_temperature
proposal = softmax(clamp(logits - max(logits), -50, 0))
MDF_next = Normalize((1 - floor_mix) * Diffuse(proposal) + floor_mix * Uniform)

score(tick) can include placement shape, trend, liquidity attraction, memory, risk, and order-book imbalance. mdf_temperature controls how sharply scores reshape the distribution. The effective temperature also includes current volatility, so high-volatility regimes soften score updates instead of letting one tick absorb all mass. Persistence, diffusion, and uniform floor mixing prevent repeated small score advantages from collapsing the MDF into a single tick.

Those relative MDFs are projected onto the pre-trade grid price_grid = price_before +/- k * gap for order-book formation. StepInfo.mdf_price_basis records that pre-trade price basis.

low temperature  -> sharper, concentrated MDF
high temperature -> wider, smoother MDF

MDFs generate aggregate intent. Intensity controls total size. The order book and execution layer then turn that intent into limit flow, taker flow, cancellations, exits, matched volume, and price changes.

Execution Guarantee

Price movement is execution-driven:

  • If a step has no executed volume, price_after == price_before.
  • If trades execute, price_after is derived from that step's execution statistics. Random quote jitter is bounded and cannot move the price by itself when executions print at the previous price.
  • seed makes the simulation reproducible for the same version and inputs.

This is a simulator, not a market data replay engine and not financial advice.

API Overview

from market_wave import (
    Market,
    DynamicMDFModel,
    generate_paths,
    compute_metrics,
    MarketState,
    IntensityState,
    LatentState,
    MDFContext,
    MDFSignals,
    MDFModel,
    RelativeMDFComponent,
    MDFState,
    OrderBookState,
    PositionMassState,
    StepInfo,
)

Useful StepInfo fields include:

  • price_before, price_after, price_change
  • tick_before, tick_after, tick_change, relative_ticks
  • mdf_price_basis, price_grid
  • buy_entry_mdf, sell_entry_mdf, long_exit_mdf, short_exit_mdf
  • buy_entry_mdf_by_price, sell_entry_mdf_by_price
  • entry_volume_by_price, exit_volume_by_price
  • buy_volume_by_price, sell_volume_by_price
  • executed_volume_by_price, total_executed_volume, trade_count
  • market_buy_volume, market_sell_volume
  • vwap_price, best_bid_before, best_ask_before, spread_after
  • orderbook_before, orderbook_after
  • position_mass_before, position_mass_after

buy_volume_by_price and sell_volume_by_price are submitted side-intent maps keyed by sampled order price, not executed or resting liquidity. market_* volume fields report the executed incoming buy/sell volume. Unfilled incoming volume rests in orderbook_after; legacy residual_market_* and crossed_market_volume fields remain compatibility zeroes in the current order-book-first engine.

The *_mdf_by_price fields are pre-trade MDF projections keyed by mdf_price_basis; current Market.state.mdf.*_by_price is reprojected to the post-trade state price. Examples and public APIs use MDF names only; stale PMF examples from earlier prototypes should be considered obsolete.

Public Contract and Snapshot Policy

The public import surface is the package __all__: Market, generate_paths, compute_metrics, generated-path metadata, MDF model/protocol types, metrics, and the state dataclasses shown above. The entrypoints are intentionally small, but the observation contract is broad because StepInfo and MarketState expose detailed simulator diagnostics.

During the current alpha line, existing public names and existing StepInfo / state fields are kept compatible where practical. New diagnostic fields may be added in alpha releases. MDF names are the supported public distribution names; stale PMF names from earlier prototypes are obsolete.

Snapshot mutability: state dataclasses are frozen=True at the attribute level, but nested dict and list fields are plain mutable containers so to_dict() and JSON export remain simple. Treat Market.state, StepInfo, and GeneratedPath.hidden_states as read-only observations. Use Market.snapshot() when downstream code needs a mutation-safe deep copy of the current state.

Compatibility note: Market.state remains available as the live current-state attribute for the alpha line. Future releases may add a more explicit read-model API or deprecation path for code that mutates state containers in place.

Development

uv sync --extra dev --extra dataframe
uv run ruff check .
uv run pytest
uv build

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

market_wave-0.4.1.tar.gz (1.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

market_wave-0.4.1-py3-none-any.whl (35.4 kB view details)

Uploaded Python 3

File details

Details for the file market_wave-0.4.1.tar.gz.

File metadata

  • Download URL: market_wave-0.4.1.tar.gz
  • Upload date:
  • Size: 1.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for market_wave-0.4.1.tar.gz
Algorithm Hash digest
SHA256 b5426b6ec7a57c2fec5f7dc55587161809a4809d68df64dfcc6fae02a18760e2
MD5 41069fdbe59ce4ffdf46035f65a1997e
BLAKE2b-256 bd78752cb43c4c678d6a6b85d8dcb3a3153bcde03bd006a2ebb5fb4d84bc6ba2

See more details on using hashes here.

Provenance

The following attestation bundles were made for market_wave-0.4.1.tar.gz:

Publisher: workflow.yml on smturtle2/market-wave

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file market_wave-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: market_wave-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 35.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for market_wave-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3bc6dfb9aefc86690f241f5470e80e986f2f5a02fc12861ba7aa3d7733b3f534
MD5 c8701e2827caf84ee60295448e3f453a
BLAKE2b-256 dc4d6815f1f0b84d5ab47a1f478da659acddfe6d4906a5f218e577587be543e6

See more details on using hashes here.

Provenance

The following attestation bundles were made for market_wave-0.4.1-py3-none-any.whl:

Publisher: workflow.yml on smturtle2/market-wave

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page