Multi-agent RL framework for adversarial market microstructure with endogenous information leakage

These details have not been verified by PyPI

Project links

Project description

Adversarial Market Microstructure — MARL Framework

Modeling financial markets as adversarial information-processing systems

Overview

This repository implements a multi-agent reinforcement learning (MARL) framework that treats financial markets not as price-discovery mechanisms, but as adversarial information ecosystems. Three co-evolving agents contest informational advantage through a realistic limit order book (LOB): an institutional execution trader attempting to conceal its intent, a market maker performing real-time Bayesian inference on order flow, and a latency arbitrageur exploiting belief-update lags.

The central research question driving this framework is:

To what extent can an informed trader optimally execute large orders while minimising the statistical detectability of their informational advantage, when facing adaptive market makers performing real-time inference on order flow?

The framework integrates three traditionally separate domains into a single system:

Domain	Role in the Framework
Market Microstructure Theory	Kyle-type informed trading, adverse selection, implementation shortfall
Adversarial Reinforcement Learning	MAPPO with centralised training / decentralised execution (CTDE)
Information Theory	MINE-based mutual information estimation, KL divergence, detectability phase transitions

Key Contributions

1. Information leakage as an explicit optimisation variable. The execution agent's reward directly penalises the mutual information between its latent type z and its observable order flow F_t, estimated in real time by a MINE neural network. This forces the agent to trade off execution efficiency against statistical detectability — a tension that does not appear in classical execution models.

2. Detectability phase transition. There exists a critical obfuscation threshold λ* beyond which informed order flow becomes statistically indistinguishable from uninformed noise — D_KL(P_informed ‖ P_noise) → 0. The framework provides tooling to locate this threshold empirically across a sweep of the leakage penalty weight λ.

3. Endogenous price formation. No exogenous price path is imposed. All price discovery emerges from agent interaction through a realistic price-time priority matching engine with Poisson noise trader arrivals and an Ornstein-Uhlenbeck fundamental value process.

4. Alternating optimisation for adversarial stability. Simultaneous gradient updates in adversarial multi-agent systems cause non-stationarity and oscillation. Phase A trains the execution agent with MM + Arb frozen; Phase B trains the market maker and arbitrageur with Exec frozen. This produces stable co-evolution of strategies.

Architecture

┌────────────────────────────────────────────────────────────────────────────┐
│                          Limit Order Book (LOB)                            │
│    Price-time priority matching · Endogenous price formation               │
│    Poisson noise trader arrivals · OU fundamental value process            │
└──────────────────┬─────────────────────┬──────────────────────────────────┘
                   │                     │                          │
            submit orders          quote updates             event stream
                   │                     │                          │
   ┌───────────────▼─────────┐  ┌────────▼────────────┐  ┌─────────▼──────────┐
   │   Execution Agent       │  │   Market Maker       │  │ Latency Arbitrageur│
   │                         │  │                      │  │                    │
   │  Minimise:              │  │  Maximise:           │  │  Maximise:         │
   │  · IS (execution cost)  │  │  · Spread capture    │  │  · Snipe profit    │
   │  · MI(z ; F_t)          │  │  · Belief accuracy   │  │  · Belief-lag rents│
   │  · Flow predictability  │  │                      │  │                    │
   │                         │  │  Infer:              │  │  Acts fastest of   │
   │  Signal encoder         │  │  b_t = P(z=1 | F_t) │  │  all three agents  │
   │  (conceals intent)      │  │  via Transformer     │  │  (1 ms latency     │
   │                         │  │  Signal decoder      │  │   advantage)       │
   └─────────────────────────┘  └──────────────────────┘  └────────────────────┘
                   │                     │                          │
                   └─────────────────────┴──────────────────────────┘
                                         │
                          ┌──────────────▼──────────────┐
                          │   Shared Centralised Critic  │
                          │   V(s, a₁, a₂, a₃)          │
                          │   Trained on full global state│
                          │   MAPPO · CTDE paradigm      │
                          └──────────────────────────────┘

Agent Reward Functions

Agent	Reward	Notes
Execution	`r_E = −IS − λ·I(z;F_t) − μ·Pred(F_t)`	IS = implementation shortfall vs arrival price; MI estimated via MINE-f
Market Maker	`r_M = PnL − α·AdvSel − β·H(b_t) + γ·Acc(b_t)`	Entropy reg prevents overconfident beliefs; accuracy bonus incentivises inference
Arbitrageur	`r_A = PnL_snipe + δ·BeliefLag`	Rewards acting before MM quote adjustment; prevents trivial wide-spread equilibria

Training Pipeline

┌─────────────────────────────────┐    ┌─────────────────────────────────────┐
│  Phase A — Execution trains     │    │  Phase B — MM + Arb train           │
│  (MM + Arb frozen)              │    │  (Execution frozen)                  │
│                                 │    │                                      │
│  1. Collect rollout (all act)   │    │  1. Collect rollout (all act)        │
│  2. Update MINE estimator       │    │  2. Update Belief Transformer        │
│  3. PPO → Execution Actor       │    │  3. PPO → MM Actor                   │
│  4. PPO → Shared Critic         │    │  4. PPO → Arb Actor                  │
│                                 │    │  5. PPO → Shared Critic              │
└─────────────────────────────────┘    └─────────────────────────────────────┘
         ◄──── alternates every 1 000 environment steps ────►

Project Structure

adversarial-market-marl/
│
├── adversarial_market/                  # Main package (~5 100 lines of source)
│   │
│   ├── agents/                          # Standalone agent wrapper classes
│   │   ├── base_agent.py                # Abstract interface: act(), freeze/unfreeze,
│   │   │                                #   save/load state, parameter counting
│   │   ├── execution_agent.py           # Institutional trader — signal encoder
│   │   │                                #   Tracks inventory, fills, IS, flow buffer
│   │   ├── market_maker.py              # Adaptive liquidity provider — signal decoder
│   │   │                                #   Rolling flow window, belief update cycle
│   │   └── arbitrageur.py               # Latency arbitrageur — temporal exploiter
│   │                                    #   Quote staleness detection, snipe recording
│   │
│   ├── environment/                     # Market simulation (price-time priority LOB)
│   │   ├── lob_env.py                   # Gymnasium-compatible multi-agent environment
│   │   │                                #   Action/obs spaces, noise traders, step logic
│   │   ├── order_book.py                # Full price-time priority matching engine
│   │   │                                #   Partial fills, VWAP, normalised snapshots
│   │   ├── order.py                     # Order, Fill, Side, OrderType dataclasses
│   │   └── market_state.py              # MarketState and per-agent AgentState containers
│   │
│   ├── networks/                        # Neural network modules
│   │   ├── actor_critic.py              # ExecutionActor (Beta+Bernoulli distributions)
│   │   │                                # MMActo (Normal), ArbActor (tanh-Normal)
│   │   │                                # SharedCritic (LayerNorm MLP)
│   │   ├── belief_transformer.py        # Pre-LN Transformer encoder → belief scalar b_t
│   │   │                                # Sinusoidal PE, masked attention, entropy-reg BCE
│   │   ├── mine_estimator.py            # MINE-f: DV bound + EMA baseline (Belghazi 2018)
│   │   │                                # PredictabilityPenalty: GRU flow predictor
│   │   └── execution_policy.py          # Deterministic baselines for benchmarking:
│   │                                    # TWAP, VWAP, LearnedPolicy, CamouflagePolicy
│   │
│   ├── training/                        # MARL training infrastructure
│   │   ├── trainer.py                   # Full MAPPO loop — rollout → update → log
│   │   │                                # Orchestrates all networks and optimisers
│   │   ├── rollout_buffer.py            # GAE-λ multi-agent rollout buffer
│   │   │                                # Minibatch generator for PPO epochs
│   │   ├── ppo_update.py                # Clipped surrogate objective, VF loss, entropy bonus
│   │   └── alternating_opt.py           # Phase A / B scheduler with state-dict serialisation
│   │
│   ├── evaluation/                      # Analysis and result generation
│   │   ├── evaluator.py                 # Policy evaluation harness
│   │   │                                # Separate informed + noise-only episode runs
│   │   ├── metrics.py                   # Implementation shortfall · KL divergence
│   │   │                                # Belief accuracy · Brier score · Flow entropy
│   │   │                                # Adverse selection · Spread-toxicity correlation
│   │   └── visualizer.py                # Phase transition plot · Episode belief trajectory
│   │                                    # Training curves · Flow distribution heatmap
│   │
│   └── utils/                           # Shared utilities
│       ├── config.py                    # YAML deep merge, validation, dot-path overrides
│       ├── logger.py                    # TensorBoard + W&B + Rich console
│       ├── math_utils.py                # KL, JSD, entropy, GAE, realised volatility, IS bps
│       └── replay_buffer.py             # Prioritised experience replay (SumTree, O(log N))
│
├── tests/                               # 222 tests — all passing
│   ├── conftest.py                      # Shared fixtures: configs, env, all 5 networks,
│   │                                    # random obs/flow data, auto-seed, pytest markers
│   ├── unit/                            # Pure unit tests (~170 tests, ~3 s on CPU)
│   │   ├── test_order_book.py           # Price-time priority, partial fills, cancellation,
│   │   │                                # VWAP, snapshot shape and normalisation
│   │   ├── test_mine_estimator.py       # DV bound convergence, EMA update, backprop
│   │   ├── test_belief_transformer.py   # Forward shapes, masked attention, loss/entropy
│   │   ├── test_actor_critic.py         # Distribution properties, evaluate_actions clamping
│   │   ├── test_agents.py               # Episode tracking, belief updates, snipe recording
│   │   ├── test_execution_policies.py   # TWAP slicing, VWAP U-shape, camouflage rate
│   │   ├── test_rollout_buffer.py       # GAE computation, minibatches, PPO NaN checks
│   │   ├── test_metrics.py              # IS, KL, belief accuracy, flow entropy
│   │   └── test_config_and_replay.py    # Config merge/validate, SumTree, priority sampling
│   └── integration/                     # End-to-end integration tests (~50 tests, ~20 s)
│       ├── test_env_step.py             # Reset/step cycle, obs shapes, inventory, flow buffer
│       ├── test_training_loop.py        # MAPPO smoke test, checkpoint save/load
│       ├── test_evaluator.py            # Full evaluation pass, finite metrics, bounds
│       └── test_checkpoint_and_utils.py # Weight fidelity, alternating phase cycling
│
├── configs/
│   ├── default.yaml                     # Full production config with documented parameters
│   ├── fast_debug.yaml                  # Smoke-test config — CPU only, runs in ~2 minutes
│   └── ablation_lambda.yaml             # λ sweep config for phase transition analysis
│
├── scripts/
│   ├── train.py                         # Training entry point with full CLI
│   ├── evaluate.py                      # Evaluation entry point — metrics + optional plots
│   └── sweep_lambda.py                  # Automated λ* phase transition sweep and plotting
│
├── docs/
│   ├── theory.md                        # Full mathematical derivations
│   │                                    # Kyle model, MINE bound, GAE, detectability conjecture
│   ├── agents.md                        # Obs/action spaces, reward decomposition, design notes
│   └── results.md                       # Expected emergent phenomena and benchmark comparisons
│
├── .github/
│   ├── workflows/
│   │   ├── ci.yml                       # Lint (black·isort·flake8·mypy) + tests (py3.10, 3.11)
│   │   └── release.yml                  # Tag-triggered release build and publish
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug_report.md
│   │   └── feature_request.md
│   └── pull_request_template.md
│
├── pyproject.toml                       # Build system (setuptools.build_meta), tool config
├── requirements.txt                     # Runtime + development dependencies
├── setup.py
├── CONTRIBUTING.md
├── CHANGELOG.md
├── LICENSE                              # MIT
├── .gitignore
└── .pre-commit-config.yaml             # black · isort · flake8 pre-commit hooks

Installation

Prerequisites

Requirement	Minimum version	Notes
Python	3.10	3.11 also supported and tested
PyTorch	2.1	CPU-only works; GPU strongly recommended for full runs
RAM	16 GB	32 GB recommended for `n_envs=8`
CUDA (optional)	11.8	For GPU training

Step-by-step setup

# 1. Clone
git clone https://github.com/yourusername/adversarial-market-marl.git
cd adversarial-market-marl

# 2. Create virtual environment
python -m venv venv
source venv/bin/activate        # Linux / macOS
# venv\Scripts\activate         # Windows PowerShell

# 3. Install dependencies
pip install -r requirements.txt

# 4. Install package in editable mode
pip install -e .

# 5. Verify
python -c "import adversarial_market; print('OK — v' + adversarial_market.__version__)"

GPU setup (optional)

# CUDA 11.8
pip install torch==2.1.0+cu118 --index-url https://download.pytorch.org/whl/cu118

# CUDA 12.1
pip install torch==2.1.0+cu121 --index-url https://download.pytorch.org/whl/cu121

Pre-commit hooks (contributors)

pip install pre-commit
pre-commit install
# Hooks run automatically on git commit: black · isort · flake8 · merge-conflict check

Quick Start

Command-line training

# Full training run — GPU recommended, ~24 h for 10M steps
python scripts/train.py \
    --config configs/default.yaml \
    --exp-name my_run

# Fast smoke test on CPU — completes in ~2 minutes
python scripts/train.py \
    --config configs/fast_debug.yaml \
    --exp-name debug

# Override specific hyperparameters inline
python scripts/train.py \
    --config configs/default.yaml \
    --exp-name lambda_search \
    --override agents.execution.lambda_leakage=0.75 \
    --override training.total_timesteps=5000000

# Resume from a checkpoint
python scripts/train.py \
    --config configs/default.yaml \
    --resume checkpoints/my_run/checkpoint_step_5000000.pt

Python API

from adversarial_market.training.trainer import MARLTrainer
from adversarial_market.utils.config import load_config

# Load and optionally override config
config = load_config(
    "configs/default.yaml",
    overrides={"agents.execution.lambda_leakage": 0.5}
)

# Build and run trainer
trainer = MARLTrainer(config)
trainer.train()

Evaluation

python scripts/evaluate.py \
    --checkpoint checkpoints/my_run/checkpoint_final.pt \
    --n-episodes 20 \
    --noise-episodes 10 \
    --output results/my_run/ \
    --plot

# Output:
#   results/my_run/eval_metrics.json   — all scalar metrics
#   results/my_run/phase_transition.pdf — KL / IS / belief accuracy curves (if --plot)

Monitor with TensorBoard

tensorboard --logdir runs/
# Navigate to http://localhost:6006

Configuration Reference

All parameters are documented inline in configs/default.yaml. The most important ones are:

# ── The key experimental variable ────────────────────────────────────────────
agents:
  execution:
    lambda_leakage: 0.5      # λ — weight on MI penalty I(z; F_t)
                             #     0.0 = pure IS minimisation (no camouflage)
                             #     > λ* = flow indistinguishable from noise
    mu_predictability: 0.1   # μ — weight on flow pattern predictability penalty
    inventory_lots: 100      # Total position to execute per episode
    horizon: 390             # Steps per episode (one trading day in minutes)

  market_maker:
    beta_entropy_reg: 0.05   # Prevents overconfident beliefs under covariate shift
    alpha_adverse_selection: 1.0
    gamma_belief_accuracy: 0.2

# ── Network architecture ──────────────────────────────────────────────────────
networks:
  belief_transformer:
    d_model: 128             # Transformer hidden dimension
    n_heads: 4               # Self-attention heads
    n_layers: 4              # Encoder layers (Pre-LN for stability)
    d_ff: 256                # Feed-forward sublayer width
    max_seq_len: 200         # Maximum order flow history length

  shared_critic:
    hidden_dims: [512, 512, 256]
    use_layer_norm: true

  mine:
    hidden_dims: [256, 256]
    ema_decay: 0.99          # EMA baseline decay for variance reduction

# ── Training ──────────────────────────────────────────────────────────────────
training:
  total_timesteps: 10_000_000
  gamma: 0.99
  gae_lambda: 0.95
  clip_range: 0.2            # PPO clip epsilon
  entropy_coef: 0.01         # Policy entropy bonus
  alternating:
    exec_phase_steps: 1000
    mm_arb_phase_steps: 1000

Core Concepts

The MINE Mutual Information Estimator

Direct computation of I(z; F_t) is intractable for continuous distributions. The framework uses the MINE-f lower bound (Belghazi et al., 2018):

I(z; F_t) ≥  E_{p(z,f)}[T_θ(z,f)]  −  log(ē_t)
               ──────────────────        ─────────
               joint expectation         EMA baseline
               (gradient signal)         (variance reduction)

where T_θ : Z × F → ℝ is a learned statistics network updated concurrently with the execution actor during Phase A training. The EMA baseline with decay α = 0.99 significantly reduces variance in the gradient estimates compared to the original MINE formulation.

Sequential Belief Inference

The market maker processes the full order flow history through a Pre-LayerNorm Transformer encoder to produce a calibrated belief:

F_t = {(side_s, price_rel_s, size_rel_s, order_type_s, time_frac_s)}_{s≤t}

b_t = sigmoid( BeliefHead( TransformerEncoder( Embed(F_t) ) ) )
    = P(z = informed | F_t) ∈ [0, 1]

The belief is trained with an entropy-regularised binary cross-entropy loss:

L_φ = BCE(b_t, z) − β · H(b_t)

The entropy term H(b_t) penalises overconfidence, keeping the market maker calibrated when the execution agent shifts strategy mid-training.

The Detectability Phase Transition

The central empirical prediction:

       D_KL(P_informed ‖ P_noise)

 High ┤
      │████████████████╲
      │                 ╲
      │                  ╲___________________________
  Low ┤
      └──────────────────┬────────────────────────── λ
      0                  λ*

      ├─── detectable ───┤───── camouflaged ─────────┤

 MM belief accuracy:
 High ┤
      │████████████╲
      │             ╲
   50%┤              ─────────────── (random guessing)
      └──────────────┬────────────────────────────── λ
      0              λ*

The sharpness of the transition — rather than a gradual decay — is the theoretically interesting result. It suggests the existence of a critical camouflage threshold below which any amount of additional leakage penalty produces no further concealment, and above which the agent achieves near-perfect camouflage at the cost of rising execution cost.

The Arbitrageur as Equilibrium Stabiliser

Without the latency arbitrageur, the market maker's best response to the execution agent is a corner solution: set permanently wide spreads and avoid adverse selection entirely. The arbitrageur breaks this by creating a sniping cost for stale quotes:

Cost_MM(spread) = AdvSel(spread) + SnipeRisk(spread)
                    └──── decreasing ────┘  └── increasing ──┘

This forces an interior optimum that requires genuine Bayesian inference. Without it, the three-agent system collapses to a degenerate two-agent game with trivial market maker strategy.

Reproducing the Phase Transition

# Sweep λ from 0 to 2.0 — trains 20 × 3 = 60 models
python scripts/sweep_lambda.py \
    --config configs/ablation_lambda.yaml \
    --lambda-min 0.0 \
    --lambda-max 2.0 \
    --n-steps 20 \
    --seeds 3 \
    --n-eval-episodes 10 \
    --output results/phase_transition/ \
    --plot

# Outputs:
#   results/phase_transition/sweep_results.json  — all metrics per (λ, seed)
#   results/phase_transition/phase_transition.pdf — Figure 1: KL + IS + belief accuracy vs λ

Expected results at full training (10M steps per model):

λ	`D_KL` (↓ = more camouflaged)	MM belief accuracy	IS (↑ = costlier)
0.0	> 1.5	> 72%	Low
0.25	> 0.8	> 65%	Low
~0.5 (λ*)	→ 0	→ 50%	Rising
1.0	≈ 0	≈ 50%	High
2.0	≈ 0	≈ 50%	Very high

Benchmarks

The execution_policy module provides deterministic baselines to benchmark against the learned policy:

Policy	Strategy	KL divergence	Implementation shortfall
`TWAPPolicy`	Equal slices, market orders, every step	Highest (maximally predictable)	Lowest in liquid markets
`VWAPPolicy`	Follows U-shaped intraday volume profile	Medium	Low–medium
`AdaptiveCamouflagePolicy`	TWAP + random noise order injection at rate `p`	Medium (decreases with `p`)	Increases with `p`
Learned (λ > λ)*	Trained end-to-end with MI penalty	Near zero	Higher than TWAP

from adversarial_market.networks.execution_policy import make_benchmark_policies

policies = make_benchmark_policies(horizon=390, max_order_size=10)

twap = policies["twap"]
twap.reset(initial_inventory=100, arrival_price=100.0)
action, _, _ = twap.act(obs, remaining=80, time_remaining=0.5, mid_price=100.1)
# action = [size_frac, limit_offset, order_type_logit]

Testing

# Run all 222 tests
pytest tests/ -v

# Unit tests only — fast (~3 seconds)
pytest tests/unit/ -v

# Integration tests — slower (~20 seconds)
pytest tests/integration/ -v

# Specific module
pytest tests/unit/test_order_book.py -v

# With coverage report
pytest tests/ --cov=adversarial_market --cov-report=html
open htmlcov/index.html

Test coverage by module

Module	Tests	Coverage focus
`environment/order_book`	22	Price-time priority, partial fills, cancellation, VWAP, LOB snapshot
`networks/actor_critic`	20	Distribution properties (Beta, Normal, tanh-Normal), `evaluate_actions` Beta support clamping
`networks/mine_estimator`	12	DV bound, EMA baseline updates, MI higher for dependent samples, backprop
`networks/belief_transformer`	14	Masked attention, variable sequence lengths, entropy-reg BCE, `update_belief` no-grad
`agents/`	24	Freeze/unfreeze, episode state tracking, belief updates, snipe profit recording
`networks/execution_policy`	15	TWAP equal-slicing, VWAP U-shaped volume profile, camouflage injection rate
`training/rollout_buffer`	16	GAE-λ advantages/returns, normalisation, minibatch generator, PPO NaN-free updates
`evaluation/metrics`	24	IS, histogram KL divergence, belief accuracy, Brier score, flow entropy, adverse selection
`utils/config + replay`	18	Deep config merge, dot-path overrides, validation, SumTree arithmetic, priority sampling
`integration/env_step`	15	Full reset/step, obs shapes, inventory decreases on fills, flow buffer lifecycle
`integration/training_loop`	10	MAPPO smoke test, rollout buffer fills, PPO finite losses, checkpoint round-trip
`integration/evaluator`	5	Full eval pass, finite metrics, completion rate in [0,1], belief accuracy in [0,1]
`integration/checkpoint`	9	Weight fidelity for all 5 networks, alternating phase cycling, state dict round-trip

Extending the Framework

Adding a new agent type

from adversarial_market.agents.base_agent import BaseAgent
import torch, numpy as np

class MyAgent(BaseAgent):
    def build_networks(self) -> None:
        import torch.nn as nn
        self.actor = nn.Sequential(nn.Linear(64, 32), nn.ReLU(), nn.Linear(32, 1))
        self.register_network(self.actor)   # enables freeze/unfreeze + param counting

    def act(self, obs: np.ndarray, deterministic: bool = False):
        with torch.no_grad():
            out = self.actor(self.to_tensor(obs).unsqueeze(0))
        return out.squeeze(0).numpy(), 0.0, 0.0

    def update(self, batch):
        return {}   # PPO update handled externally by PPOUpdate

Swapping the MI estimator

from adversarial_market.training.trainer import MARLTrainer
from adversarial_market.utils.config import load_config

config = load_config("configs/default.yaml")
trainer = MARLTrainer(config)

# Replace MINE with any callable that exposes estimate_only(z, flow) -> float
trainer.mine = MyCustomMIEstimator(z_dim=1, f_dim=4)
trainer.mine_opt = torch.optim.Adam(trainer.mine.parameters(), lr=1e-4)
trainer.train()

Programmatic experiment loop

import json
from adversarial_market.utils.config import load_config
from adversarial_market.training.trainer import MARLTrainer
from adversarial_market.evaluation.evaluator import Evaluator

results = {}
for lam in [0.0, 0.25, 0.5, 0.75, 1.0, 1.5, 2.0]:
    config = load_config(
        "configs/default.yaml",
        overrides={"agents.execution.lambda_leakage": lam,
                   "training.total_timesteps": 3_000_000}
    )
    trainer = MARLTrainer(config)
    trainer.train()

    evaluator = Evaluator(config)
    metrics = evaluator.evaluate(
        exec_actor=trainer.exec_actor,
        mm_actor=trainer.mm_actor,
        arb_actor=trainer.arb_actor,
        belief_transformer=trainer.belief_transformer,
        mine=trainer.mine,
        n_episodes=10,
        noise_only_episodes=5,
    )
    results[lam] = {
        "kl": metrics.get("eval/exec/kl_divergence_mean"),
        "is": metrics.get("eval/exec/implementation_shortfall_mean"),
        "mm_acc": metrics.get("eval/mm/belief_accuracy_mean"),
    }
    print(f"λ={lam:.2f}  KL={results[lam]['kl']:.3f}  acc={results[lam]['mm_acc']:.3f}")

with open("results/lambda_sweep.json", "w") as f:
    json.dump(results, f, indent=2)

Theoretical Background

Full mathematical derivations are in docs/theory.md. The framework builds on the following foundational works:

Reference	Relevance
Kyle (1985) — Continuous Auctions and Insider Trading	The static informed-trading model this framework extends to learned, adversarial strategies
Glosten & Milgrom (1985) — Bid, Ask, and Transaction Prices	Theoretical basis for spread as a function of adverse selection probability
Belghazi et al. (2018) — MINE	The mutual information lower bound used for the leakage penalty
Lowe et al. (2017) — MADDPG	The CTDE paradigm: centralised training, decentralised execution
Schulman et al. (2017) — PPO	The policy gradient algorithm used for all three agents
Schulman et al. (2016) — GAE	Generalised advantage estimation used in the rollout buffer
Vaswani et al. (2017) — Attention Is All You Need	The Transformer architecture underlying the market maker's belief model
Schaul et al. (2016) — PER	Prioritised experience replay used in the auxiliary belief training buffer

Expected Emergent Phenomena

After full training (~10M timesteps), the system reliably produces:

Adaptive order splitting — The execution agent discovers TWAP/VWAP-like strategies without explicit programming. Order sizes shrink and timing becomes more uniform as λ increases. This validates that the MI penalty alone is sufficient to produce camouflage behaviour.

Belief-driven spread widening — The market maker widens spreads precisely when its belief b_t is high. The correlation ρ(spread_t, b_t) > 0.6 in low-λ regimes (where flow is detectable) and collapses to ≈ 0 in high-λ regimes (where flow is camouflaged and the market maker cannot distinguish).

Latency arbitrage cycles — The arbitrageur systematically snipes stale quotes. The cycle period between exploitable gaps is determined by the market maker's belief-update speed — shorter periods as the Transformer becomes faster.

Flow distribution convergence — At λ > λ*, the empirical distribution of execution agent order features (size, offset, timing) converges towards the noise trader distribution in total variation distance. The KL divergence collapses to near zero.

Development

Running the full CI suite locally

# 1. Formatting
black --target-version py310 --line-length 100 adversarial_market tests scripts
isort --profile black --line-length 100 adversarial_market tests scripts

# 2. Linting
flake8 adversarial_market scripts --max-line-length=100 --ignore=E203,W503
flake8 tests --max-line-length=100 --ignore=E203,W503,E402

# 3. Type checking
mypy adversarial_market --ignore-missing-imports --no-strict-optional

# 4. Tests with coverage
pytest tests/ -v --cov=adversarial_market --cov-report=term-missing

# 5. Package install check
pip install -e .
python -c "import adversarial_market; print('OK')"

CI pipeline (GitHub Actions)

Job	Python versions	Steps
Lint	3.10	black · isort · flake8 (source) · flake8 (tests) · mypy
Test	3.10, 3.11	`pip install -r requirements.txt` · `pip install -e .` · pytest with coverage

Contributing

Contributions are welcome. Please read CONTRIBUTING.md before opening a pull request. Key points:

All new code must include tests maintaining the 222-test suite passing
All formatting and type checks must pass locally before pushing
For significant changes (new agents, new MI estimators, environment modifications), open an issue first to discuss the design

Citation

If this framework is useful in your research, please cite:

@software{adversarial_market_marl_2024,
  title   = {{Adversarial Market Microstructure}: A Multi-Agent Reinforcement Learning
             Framework for Covert Execution and Adaptive Signal Detection},
  year    = {2024},
  url     = {https://github.com/yourusername/adversarial-market-marl},
  license = {MIT}
}

License

Released under the MIT License — see LICENSE for full terms.

Built with PyTorch · Gymnasium · MINE · MAPPO

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Mar 29, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adversarial_market_marl-0.1.0.tar.gz (82.2 kB view details)

Uploaded Mar 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

adversarial_market_marl-0.1.0-py3-none-any.whl (74.0 kB view details)

Uploaded Mar 29, 2026 Python 3

File details

Details for the file adversarial_market_marl-0.1.0.tar.gz.

File metadata

Download URL: adversarial_market_marl-0.1.0.tar.gz
Upload date: Mar 29, 2026
Size: 82.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for adversarial_market_marl-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`8d534d70c50755c49587b7f7801beabbd27d8353093a6d9ecfaea095a6c1c6eb`
MD5	`7c8943149d908d10482c4efa958c2c63`
BLAKE2b-256	`26546146cd11cf9ce031e02765f1d5ad619dc3f69027f118df6fc70cfbd7872e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for adversarial_market_marl-0.1.0.tar.gz:

Publisher: release.yml on Nityahapani/adversarial-market-marl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: adversarial_market_marl-0.1.0.tar.gz
- Subject digest: 8d534d70c50755c49587b7f7801beabbd27d8353093a6d9ecfaea095a6c1c6eb
- Sigstore transparency entry: 1192086322
- Sigstore integration time: Mar 29, 2026
Source repository:
- Permalink: Nityahapani/adversarial-market-marl@814daffce06b7e6336919be27794a41cc1febb28
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/Nityahapani
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@814daffce06b7e6336919be27794a41cc1febb28
- Trigger Event: push

File details

Details for the file adversarial_market_marl-0.1.0-py3-none-any.whl.

File metadata

Download URL: adversarial_market_marl-0.1.0-py3-none-any.whl
Upload date: Mar 29, 2026
Size: 74.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for adversarial_market_marl-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cdef7b6e6f699ce6dbef028000dfecf5210089e2494f97ac6c4fe5e63ed09014`
MD5	`94b8fbd2d95f64928d23d536e49da189`
BLAKE2b-256	`d0625ff4263bc9331c7c34d476512bafbb6bcdcaa7f4d87c1985b35b421ecde9`

See more details on using hashes here.

Provenance

The following attestation bundles were made for adversarial_market_marl-0.1.0-py3-none-any.whl:

Publisher: release.yml on Nityahapani/adversarial-market-marl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: adversarial_market_marl-0.1.0-py3-none-any.whl
- Subject digest: cdef7b6e6f699ce6dbef028000dfecf5210089e2494f97ac6c4fe5e63ed09014
- Sigstore transparency entry: 1192086323
- Sigstore integration time: Mar 29, 2026
Source repository:
- Permalink: Nityahapani/adversarial-market-marl@814daffce06b7e6336919be27794a41cc1febb28
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/Nityahapani
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@814daffce06b7e6336919be27794a41cc1febb28
- Trigger Event: push

adversarial-market-marl 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Adversarial Market Microstructure — MARL Framework

Overview

Key Contributions

Architecture

Agent Reward Functions

Training Pipeline

Project Structure

Installation

Prerequisites

Step-by-step setup

GPU setup (optional)

Pre-commit hooks (contributors)

Quick Start

Command-line training

Python API

Evaluation

Monitor with TensorBoard

Configuration Reference

Core Concepts

The MINE Mutual Information Estimator

Sequential Belief Inference

The Detectability Phase Transition

The Arbitrageur as Equilibrium Stabiliser

Reproducing the Phase Transition

Benchmarks

Testing

Test coverage by module

Extending the Framework

Adding a new agent type

Swapping the MI estimator

Programmatic experiment loop

Theoretical Background

Expected Emergent Phenomena

Development

Running the full CI suite locally

CI pipeline (GitHub Actions)

Contributing

Citation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance