Skip to main content

Multi-agent RL framework for adversarial market microstructure with endogenous information leakage

Project description

Adversarial Market Microstructure — MARL Framework

Modeling financial markets as adversarial information-processing systems

Python 3.10+ PyTorch 2.1+ License: MIT Tests Code style: black Checked with mypy CI


Overview

This repository implements a multi-agent reinforcement learning (MARL) framework that treats financial markets not as price-discovery mechanisms, but as adversarial information ecosystems. Three co-evolving agents contest informational advantage through a realistic limit order book (LOB): an institutional execution trader attempting to conceal its intent, a market maker performing real-time Bayesian inference on order flow, and a latency arbitrageur exploiting belief-update lags.

The central research question driving this framework is:

To what extent can an informed trader optimally execute large orders while minimising the statistical detectability of their informational advantage, when facing adaptive market makers performing real-time inference on order flow?

The framework integrates three traditionally separate domains into a single system:

Domain Role in the Framework
Market Microstructure Theory Kyle-type informed trading, adverse selection, implementation shortfall
Adversarial Reinforcement Learning MAPPO with centralised training / decentralised execution (CTDE)
Information Theory MINE-based mutual information estimation, KL divergence, detectability phase transitions

Key Contributions

1. Information leakage as an explicit optimisation variable. The execution agent's reward directly penalises the mutual information between its latent type z and its observable order flow F_t, estimated in real time by a MINE neural network. This forces the agent to trade off execution efficiency against statistical detectability — a tension that does not appear in classical execution models.

2. Detectability phase transition. There exists a critical obfuscation threshold λ* beyond which informed order flow becomes statistically indistinguishable from uninformed noise — D_KL(P_informed ‖ P_noise) → 0. The framework provides tooling to locate this threshold empirically across a sweep of the leakage penalty weight λ.

3. Endogenous price formation. No exogenous price path is imposed. All price discovery emerges from agent interaction through a realistic price-time priority matching engine with Poisson noise trader arrivals and an Ornstein-Uhlenbeck fundamental value process.

4. Alternating optimisation for adversarial stability. Simultaneous gradient updates in adversarial multi-agent systems cause non-stationarity and oscillation. Phase A trains the execution agent with MM + Arb frozen; Phase B trains the market maker and arbitrageur with Exec frozen. This produces stable co-evolution of strategies.


Architecture

┌────────────────────────────────────────────────────────────────────────────┐
│                          Limit Order Book (LOB)                            │
│    Price-time priority matching · Endogenous price formation               │
│    Poisson noise trader arrivals · OU fundamental value process            │
└──────────────────┬─────────────────────┬──────────────────────────────────┘
                   │                     │                          │
            submit orders          quote updates             event stream
                   │                     │                          │
   ┌───────────────▼─────────┐  ┌────────▼────────────┐  ┌─────────▼──────────┐
   │   Execution Agent       │  │   Market Maker       │  │ Latency Arbitrageur│
   │                         │  │                      │  │                    │
   │  Minimise:              │  │  Maximise:           │  │  Maximise:         │
   │  · IS (execution cost)  │  │  · Spread capture    │  │  · Snipe profit    │
   │  · MI(z ; F_t)          │  │  · Belief accuracy   │  │  · Belief-lag rents│
   │  · Flow predictability  │  │                      │  │                    │
   │                         │  │  Infer:              │  │  Acts fastest of   │
   │  Signal encoder         │  │  b_t = P(z=1 | F_t) │  │  all three agents  │
   │  (conceals intent)      │  │  via Transformer     │  │  (1 ms latency     │
   │                         │  │  Signal decoder      │  │   advantage)       │
   └─────────────────────────┘  └──────────────────────┘  └────────────────────┘
                   │                     │                          │
                   └─────────────────────┴──────────────────────────┘
                                         │
                          ┌──────────────▼──────────────┐
                          │   Shared Centralised Critic  │
                          │   V(s, a₁, a₂, a₃)          │
                          │   Trained on full global state│
                          │   MAPPO · CTDE paradigm      │
                          └──────────────────────────────┘

Agent Reward Functions

Agent Reward Notes
Execution r_E = −IS − λ·I(z;F_t) − μ·Pred(F_t) IS = implementation shortfall vs arrival price; MI estimated via MINE-f
Market Maker r_M = PnL − α·AdvSel − β·H(b_t) + γ·Acc(b_t) Entropy reg prevents overconfident beliefs; accuracy bonus incentivises inference
Arbitrageur r_A = PnL_snipe + δ·BeliefLag Rewards acting before MM quote adjustment; prevents trivial wide-spread equilibria

Training Pipeline

┌─────────────────────────────────┐    ┌─────────────────────────────────────┐
│  Phase A — Execution trains     │    │  Phase B — MM + Arb train           │
│  (MM + Arb frozen)              │    │  (Execution frozen)                  │
│                                 │    │                                      │
│  1. Collect rollout (all act)   │    │  1. Collect rollout (all act)        │
│  2. Update MINE estimator       │    │  2. Update Belief Transformer        │
│  3. PPO → Execution Actor       │    │  3. PPO → MM Actor                   │
│  4. PPO → Shared Critic         │    │  4. PPO → Arb Actor                  │
│                                 │    │  5. PPO → Shared Critic              │
└─────────────────────────────────┘    └─────────────────────────────────────┘
         ◄──── alternates every 1 000 environment steps ────►

Project Structure

adversarial-market-marl/
│
├── adversarial_market/                  # Main package (~5 100 lines of source)
│   │
│   ├── agents/                          # Standalone agent wrapper classes
│   │   ├── base_agent.py                # Abstract interface: act(), freeze/unfreeze,
│   │   │                                #   save/load state, parameter counting
│   │   ├── execution_agent.py           # Institutional trader — signal encoder
│   │   │                                #   Tracks inventory, fills, IS, flow buffer
│   │   ├── market_maker.py              # Adaptive liquidity provider — signal decoder
│   │   │                                #   Rolling flow window, belief update cycle
│   │   └── arbitrageur.py               # Latency arbitrageur — temporal exploiter
│   │                                    #   Quote staleness detection, snipe recording
│   │
│   ├── environment/                     # Market simulation (price-time priority LOB)
│   │   ├── lob_env.py                   # Gymnasium-compatible multi-agent environment
│   │   │                                #   Action/obs spaces, noise traders, step logic
│   │   ├── order_book.py                # Full price-time priority matching engine
│   │   │                                #   Partial fills, VWAP, normalised snapshots
│   │   ├── order.py                     # Order, Fill, Side, OrderType dataclasses
│   │   └── market_state.py              # MarketState and per-agent AgentState containers
│   │
│   ├── networks/                        # Neural network modules
│   │   ├── actor_critic.py              # ExecutionActor (Beta+Bernoulli distributions)
│   │   │                                # MMActo (Normal), ArbActor (tanh-Normal)
│   │   │                                # SharedCritic (LayerNorm MLP)
│   │   ├── belief_transformer.py        # Pre-LN Transformer encoder → belief scalar b_t
│   │   │                                # Sinusoidal PE, masked attention, entropy-reg BCE
│   │   ├── mine_estimator.py            # MINE-f: DV bound + EMA baseline (Belghazi 2018)
│   │   │                                # PredictabilityPenalty: GRU flow predictor
│   │   └── execution_policy.py          # Deterministic baselines for benchmarking:
│   │                                    # TWAP, VWAP, LearnedPolicy, CamouflagePolicy
│   │
│   ├── training/                        # MARL training infrastructure
│   │   ├── trainer.py                   # Full MAPPO loop — rollout → update → log
│   │   │                                # Orchestrates all networks and optimisers
│   │   ├── rollout_buffer.py            # GAE-λ multi-agent rollout buffer
│   │   │                                # Minibatch generator for PPO epochs
│   │   ├── ppo_update.py                # Clipped surrogate objective, VF loss, entropy bonus
│   │   └── alternating_opt.py           # Phase A / B scheduler with state-dict serialisation
│   │
│   ├── evaluation/                      # Analysis and result generation
│   │   ├── evaluator.py                 # Policy evaluation harness
│   │   │                                # Separate informed + noise-only episode runs
│   │   ├── metrics.py                   # Implementation shortfall · KL divergence
│   │   │                                # Belief accuracy · Brier score · Flow entropy
│   │   │                                # Adverse selection · Spread-toxicity correlation
│   │   └── visualizer.py                # Phase transition plot · Episode belief trajectory
│   │                                    # Training curves · Flow distribution heatmap
│   │
│   └── utils/                           # Shared utilities
│       ├── config.py                    # YAML deep merge, validation, dot-path overrides
│       ├── logger.py                    # TensorBoard + W&B + Rich console
│       ├── math_utils.py                # KL, JSD, entropy, GAE, realised volatility, IS bps
│       └── replay_buffer.py             # Prioritised experience replay (SumTree, O(log N))
│
├── tests/                               # 222 tests — all passing
│   ├── conftest.py                      # Shared fixtures: configs, env, all 5 networks,
│   │                                    # random obs/flow data, auto-seed, pytest markers
│   ├── unit/                            # Pure unit tests (~170 tests, ~3 s on CPU)
│   │   ├── test_order_book.py           # Price-time priority, partial fills, cancellation,
│   │   │                                # VWAP, snapshot shape and normalisation
│   │   ├── test_mine_estimator.py       # DV bound convergence, EMA update, backprop
│   │   ├── test_belief_transformer.py   # Forward shapes, masked attention, loss/entropy
│   │   ├── test_actor_critic.py         # Distribution properties, evaluate_actions clamping
│   │   ├── test_agents.py               # Episode tracking, belief updates, snipe recording
│   │   ├── test_execution_policies.py   # TWAP slicing, VWAP U-shape, camouflage rate
│   │   ├── test_rollout_buffer.py       # GAE computation, minibatches, PPO NaN checks
│   │   ├── test_metrics.py              # IS, KL, belief accuracy, flow entropy
│   │   └── test_config_and_replay.py    # Config merge/validate, SumTree, priority sampling
│   └── integration/                     # End-to-end integration tests (~50 tests, ~20 s)
│       ├── test_env_step.py             # Reset/step cycle, obs shapes, inventory, flow buffer
│       ├── test_training_loop.py        # MAPPO smoke test, checkpoint save/load
│       ├── test_evaluator.py            # Full evaluation pass, finite metrics, bounds
│       └── test_checkpoint_and_utils.py # Weight fidelity, alternating phase cycling
│
├── configs/
│   ├── default.yaml                     # Full production config with documented parameters
│   ├── fast_debug.yaml                  # Smoke-test config — CPU only, runs in ~2 minutes
│   └── ablation_lambda.yaml             # λ sweep config for phase transition analysis
│
├── scripts/
│   ├── train.py                         # Training entry point with full CLI
│   ├── evaluate.py                      # Evaluation entry point — metrics + optional plots
│   └── sweep_lambda.py                  # Automated λ* phase transition sweep and plotting
│
├── docs/
│   ├── theory.md                        # Full mathematical derivations
│   │                                    # Kyle model, MINE bound, GAE, detectability conjecture
│   ├── agents.md                        # Obs/action spaces, reward decomposition, design notes
│   └── results.md                       # Expected emergent phenomena and benchmark comparisons
│
├── .github/
│   ├── workflows/
│   │   ├── ci.yml                       # Lint (black·isort·flake8·mypy) + tests (py3.10, 3.11)
│   │   └── release.yml                  # Tag-triggered release build and publish
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug_report.md
│   │   └── feature_request.md
│   └── pull_request_template.md
│
├── pyproject.toml                       # Build system (setuptools.build_meta), tool config
├── requirements.txt                     # Runtime + development dependencies
├── setup.py
├── CONTRIBUTING.md
├── CHANGELOG.md
├── LICENSE                              # MIT
├── .gitignore
└── .pre-commit-config.yaml             # black · isort · flake8 pre-commit hooks

Installation

Prerequisites

Requirement Minimum version Notes
Python 3.10 3.11 also supported and tested
PyTorch 2.1 CPU-only works; GPU strongly recommended for full runs
RAM 16 GB 32 GB recommended for n_envs=8
CUDA (optional) 11.8 For GPU training

Step-by-step setup

# 1. Clone
git clone https://github.com/yourusername/adversarial-market-marl.git
cd adversarial-market-marl

# 2. Create virtual environment
python -m venv venv
source venv/bin/activate        # Linux / macOS
# venv\Scripts\activate         # Windows PowerShell

# 3. Install dependencies
pip install -r requirements.txt

# 4. Install package in editable mode
pip install -e .

# 5. Verify
python -c "import adversarial_market; print('OK — v' + adversarial_market.__version__)"

GPU setup (optional)

# CUDA 11.8
pip install torch==2.1.0+cu118 --index-url https://download.pytorch.org/whl/cu118

# CUDA 12.1
pip install torch==2.1.0+cu121 --index-url https://download.pytorch.org/whl/cu121

Pre-commit hooks (contributors)

pip install pre-commit
pre-commit install
# Hooks run automatically on git commit: black · isort · flake8 · merge-conflict check

Quick Start

Command-line training

# Full training run — GPU recommended, ~24 h for 10M steps
python scripts/train.py \
    --config configs/default.yaml \
    --exp-name my_run

# Fast smoke test on CPU — completes in ~2 minutes
python scripts/train.py \
    --config configs/fast_debug.yaml \
    --exp-name debug

# Override specific hyperparameters inline
python scripts/train.py \
    --config configs/default.yaml \
    --exp-name lambda_search \
    --override agents.execution.lambda_leakage=0.75 \
    --override training.total_timesteps=5000000

# Resume from a checkpoint
python scripts/train.py \
    --config configs/default.yaml \
    --resume checkpoints/my_run/checkpoint_step_5000000.pt

Python API

from adversarial_market.training.trainer import MARLTrainer
from adversarial_market.utils.config import load_config

# Load and optionally override config
config = load_config(
    "configs/default.yaml",
    overrides={"agents.execution.lambda_leakage": 0.5}
)

# Build and run trainer
trainer = MARLTrainer(config)
trainer.train()

Evaluation

python scripts/evaluate.py \
    --checkpoint checkpoints/my_run/checkpoint_final.pt \
    --n-episodes 20 \
    --noise-episodes 10 \
    --output results/my_run/ \
    --plot

# Output:
#   results/my_run/eval_metrics.json   — all scalar metrics
#   results/my_run/phase_transition.pdf — KL / IS / belief accuracy curves (if --plot)

Monitor with TensorBoard

tensorboard --logdir runs/
# Navigate to http://localhost:6006

Configuration Reference

All parameters are documented inline in configs/default.yaml. The most important ones are:

# ── The key experimental variable ────────────────────────────────────────────
agents:
  execution:
    lambda_leakage: 0.5      # λ — weight on MI penalty I(z; F_t)
                             #     0.0 = pure IS minimisation (no camouflage)
                             #     > λ* = flow indistinguishable from noise
    mu_predictability: 0.1   # μ — weight on flow pattern predictability penalty
    inventory_lots: 100      # Total position to execute per episode
    horizon: 390             # Steps per episode (one trading day in minutes)

  market_maker:
    beta_entropy_reg: 0.05   # Prevents overconfident beliefs under covariate shift
    alpha_adverse_selection: 1.0
    gamma_belief_accuracy: 0.2

# ── Network architecture ──────────────────────────────────────────────────────
networks:
  belief_transformer:
    d_model: 128             # Transformer hidden dimension
    n_heads: 4               # Self-attention heads
    n_layers: 4              # Encoder layers (Pre-LN for stability)
    d_ff: 256                # Feed-forward sublayer width
    max_seq_len: 200         # Maximum order flow history length

  shared_critic:
    hidden_dims: [512, 512, 256]
    use_layer_norm: true

  mine:
    hidden_dims: [256, 256]
    ema_decay: 0.99          # EMA baseline decay for variance reduction

# ── Training ──────────────────────────────────────────────────────────────────
training:
  total_timesteps: 10_000_000
  gamma: 0.99
  gae_lambda: 0.95
  clip_range: 0.2            # PPO clip epsilon
  entropy_coef: 0.01         # Policy entropy bonus
  alternating:
    exec_phase_steps: 1000
    mm_arb_phase_steps: 1000

Core Concepts

The MINE Mutual Information Estimator

Direct computation of I(z; F_t) is intractable for continuous distributions. The framework uses the MINE-f lower bound (Belghazi et al., 2018):

I(z; F_t) ≥  E_{p(z,f)}[T_θ(z,f)]  −  log(ē_t)
               ──────────────────        ─────────
               joint expectation         EMA baseline
               (gradient signal)         (variance reduction)

where T_θ : Z × F → ℝ is a learned statistics network updated concurrently with the execution actor during Phase A training. The EMA baseline with decay α = 0.99 significantly reduces variance in the gradient estimates compared to the original MINE formulation.

Sequential Belief Inference

The market maker processes the full order flow history through a Pre-LayerNorm Transformer encoder to produce a calibrated belief:

F_t = {(side_s, price_rel_s, size_rel_s, order_type_s, time_frac_s)}_{s≤t}

b_t = sigmoid( BeliefHead( TransformerEncoder( Embed(F_t) ) ) )
    = P(z = informed | F_t) ∈ [0, 1]

The belief is trained with an entropy-regularised binary cross-entropy loss:

L_φ = BCE(b_t, z) − β · H(b_t)

The entropy term H(b_t) penalises overconfidence, keeping the market maker calibrated when the execution agent shifts strategy mid-training.

The Detectability Phase Transition

The central empirical prediction:

       D_KL(P_informed ‖ P_noise)

 High ┤
      │████████████████╲
      │                 ╲
      │                  ╲___________________________
  Low ┤
      └──────────────────┬────────────────────────── λ
      0                  λ*

      ├─── detectable ───┤───── camouflaged ─────────┤

 MM belief accuracy:
 High ┤
      │████████████╲
      │             ╲
   50%┤              ─────────────── (random guessing)
      └──────────────┬────────────────────────────── λ
      0              λ*

The sharpness of the transition — rather than a gradual decay — is the theoretically interesting result. It suggests the existence of a critical camouflage threshold below which any amount of additional leakage penalty produces no further concealment, and above which the agent achieves near-perfect camouflage at the cost of rising execution cost.

The Arbitrageur as Equilibrium Stabiliser

Without the latency arbitrageur, the market maker's best response to the execution agent is a corner solution: set permanently wide spreads and avoid adverse selection entirely. The arbitrageur breaks this by creating a sniping cost for stale quotes:

Cost_MM(spread) = AdvSel(spread) + SnipeRisk(spread)
                    └──── decreasing ────┘  └── increasing ──┘

This forces an interior optimum that requires genuine Bayesian inference. Without it, the three-agent system collapses to a degenerate two-agent game with trivial market maker strategy.


Reproducing the Phase Transition

# Sweep λ from 0 to 2.0 — trains 20 × 3 = 60 models
python scripts/sweep_lambda.py \
    --config configs/ablation_lambda.yaml \
    --lambda-min 0.0 \
    --lambda-max 2.0 \
    --n-steps 20 \
    --seeds 3 \
    --n-eval-episodes 10 \
    --output results/phase_transition/ \
    --plot

# Outputs:
#   results/phase_transition/sweep_results.json  — all metrics per (λ, seed)
#   results/phase_transition/phase_transition.pdf — Figure 1: KL + IS + belief accuracy vs λ

Expected results at full training (10M steps per model):

λ D_KL (↓ = more camouflaged) MM belief accuracy IS (↑ = costlier)
0.0 > 1.5 > 72% Low
0.25 > 0.8 > 65% Low
~0.5 (λ*) → 0 → 50% Rising
1.0 ≈ 0 ≈ 50% High
2.0 ≈ 0 ≈ 50% Very high

Benchmarks

The execution_policy module provides deterministic baselines to benchmark against the learned policy:

Policy Strategy KL divergence Implementation shortfall
TWAPPolicy Equal slices, market orders, every step Highest (maximally predictable) Lowest in liquid markets
VWAPPolicy Follows U-shaped intraday volume profile Medium Low–medium
AdaptiveCamouflagePolicy TWAP + random noise order injection at rate p Medium (decreases with p) Increases with p
Learned (λ > λ)* Trained end-to-end with MI penalty Near zero Higher than TWAP
from adversarial_market.networks.execution_policy import make_benchmark_policies

policies = make_benchmark_policies(horizon=390, max_order_size=10)

twap = policies["twap"]
twap.reset(initial_inventory=100, arrival_price=100.0)
action, _, _ = twap.act(obs, remaining=80, time_remaining=0.5, mid_price=100.1)
# action = [size_frac, limit_offset, order_type_logit]

Testing

# Run all 222 tests
pytest tests/ -v

# Unit tests only — fast (~3 seconds)
pytest tests/unit/ -v

# Integration tests — slower (~20 seconds)
pytest tests/integration/ -v

# Specific module
pytest tests/unit/test_order_book.py -v

# With coverage report
pytest tests/ --cov=adversarial_market --cov-report=html
open htmlcov/index.html

Test coverage by module

Module Tests Coverage focus
environment/order_book 22 Price-time priority, partial fills, cancellation, VWAP, LOB snapshot
networks/actor_critic 20 Distribution properties (Beta, Normal, tanh-Normal), evaluate_actions Beta support clamping
networks/mine_estimator 12 DV bound, EMA baseline updates, MI higher for dependent samples, backprop
networks/belief_transformer 14 Masked attention, variable sequence lengths, entropy-reg BCE, update_belief no-grad
agents/ 24 Freeze/unfreeze, episode state tracking, belief updates, snipe profit recording
networks/execution_policy 15 TWAP equal-slicing, VWAP U-shaped volume profile, camouflage injection rate
training/rollout_buffer 16 GAE-λ advantages/returns, normalisation, minibatch generator, PPO NaN-free updates
evaluation/metrics 24 IS, histogram KL divergence, belief accuracy, Brier score, flow entropy, adverse selection
utils/config + replay 18 Deep config merge, dot-path overrides, validation, SumTree arithmetic, priority sampling
integration/env_step 15 Full reset/step, obs shapes, inventory decreases on fills, flow buffer lifecycle
integration/training_loop 10 MAPPO smoke test, rollout buffer fills, PPO finite losses, checkpoint round-trip
integration/evaluator 5 Full eval pass, finite metrics, completion rate in [0,1], belief accuracy in [0,1]
integration/checkpoint 9 Weight fidelity for all 5 networks, alternating phase cycling, state dict round-trip

Extending the Framework

Adding a new agent type

from adversarial_market.agents.base_agent import BaseAgent
import torch, numpy as np

class MyAgent(BaseAgent):
    def build_networks(self) -> None:
        import torch.nn as nn
        self.actor = nn.Sequential(nn.Linear(64, 32), nn.ReLU(), nn.Linear(32, 1))
        self.register_network(self.actor)   # enables freeze/unfreeze + param counting

    def act(self, obs: np.ndarray, deterministic: bool = False):
        with torch.no_grad():
            out = self.actor(self.to_tensor(obs).unsqueeze(0))
        return out.squeeze(0).numpy(), 0.0, 0.0

    def update(self, batch):
        return {}   # PPO update handled externally by PPOUpdate

Swapping the MI estimator

from adversarial_market.training.trainer import MARLTrainer
from adversarial_market.utils.config import load_config

config = load_config("configs/default.yaml")
trainer = MARLTrainer(config)

# Replace MINE with any callable that exposes estimate_only(z, flow) -> float
trainer.mine = MyCustomMIEstimator(z_dim=1, f_dim=4)
trainer.mine_opt = torch.optim.Adam(trainer.mine.parameters(), lr=1e-4)
trainer.train()

Programmatic experiment loop

import json
from adversarial_market.utils.config import load_config
from adversarial_market.training.trainer import MARLTrainer
from adversarial_market.evaluation.evaluator import Evaluator

results = {}
for lam in [0.0, 0.25, 0.5, 0.75, 1.0, 1.5, 2.0]:
    config = load_config(
        "configs/default.yaml",
        overrides={"agents.execution.lambda_leakage": lam,
                   "training.total_timesteps": 3_000_000}
    )
    trainer = MARLTrainer(config)
    trainer.train()

    evaluator = Evaluator(config)
    metrics = evaluator.evaluate(
        exec_actor=trainer.exec_actor,
        mm_actor=trainer.mm_actor,
        arb_actor=trainer.arb_actor,
        belief_transformer=trainer.belief_transformer,
        mine=trainer.mine,
        n_episodes=10,
        noise_only_episodes=5,
    )
    results[lam] = {
        "kl": metrics.get("eval/exec/kl_divergence_mean"),
        "is": metrics.get("eval/exec/implementation_shortfall_mean"),
        "mm_acc": metrics.get("eval/mm/belief_accuracy_mean"),
    }
    print(f"λ={lam:.2f}  KL={results[lam]['kl']:.3f}  acc={results[lam]['mm_acc']:.3f}")

with open("results/lambda_sweep.json", "w") as f:
    json.dump(results, f, indent=2)

Theoretical Background

Full mathematical derivations are in docs/theory.md. The framework builds on the following foundational works:

Reference Relevance
Kyle (1985) — Continuous Auctions and Insider Trading The static informed-trading model this framework extends to learned, adversarial strategies
Glosten & Milgrom (1985) — Bid, Ask, and Transaction Prices Theoretical basis for spread as a function of adverse selection probability
Belghazi et al. (2018) — MINE The mutual information lower bound used for the leakage penalty
Lowe et al. (2017) — MADDPG The CTDE paradigm: centralised training, decentralised execution
Schulman et al. (2017) — PPO The policy gradient algorithm used for all three agents
Schulman et al. (2016) — GAE Generalised advantage estimation used in the rollout buffer
Vaswani et al. (2017) — Attention Is All You Need The Transformer architecture underlying the market maker's belief model
Schaul et al. (2016) — PER Prioritised experience replay used in the auxiliary belief training buffer

Expected Emergent Phenomena

After full training (~10M timesteps), the system reliably produces:

Adaptive order splitting — The execution agent discovers TWAP/VWAP-like strategies without explicit programming. Order sizes shrink and timing becomes more uniform as λ increases. This validates that the MI penalty alone is sufficient to produce camouflage behaviour.

Belief-driven spread widening — The market maker widens spreads precisely when its belief b_t is high. The correlation ρ(spread_t, b_t) > 0.6 in low-λ regimes (where flow is detectable) and collapses to ≈ 0 in high-λ regimes (where flow is camouflaged and the market maker cannot distinguish).

Latency arbitrage cycles — The arbitrageur systematically snipes stale quotes. The cycle period between exploitable gaps is determined by the market maker's belief-update speed — shorter periods as the Transformer becomes faster.

Flow distribution convergence — At λ > λ*, the empirical distribution of execution agent order features (size, offset, timing) converges towards the noise trader distribution in total variation distance. The KL divergence collapses to near zero.


Development

Running the full CI suite locally

# 1. Formatting
black --target-version py310 --line-length 100 adversarial_market tests scripts
isort --profile black --line-length 100 adversarial_market tests scripts

# 2. Linting
flake8 adversarial_market scripts --max-line-length=100 --ignore=E203,W503
flake8 tests --max-line-length=100 --ignore=E203,W503,E402

# 3. Type checking
mypy adversarial_market --ignore-missing-imports --no-strict-optional

# 4. Tests with coverage
pytest tests/ -v --cov=adversarial_market --cov-report=term-missing

# 5. Package install check
pip install -e .
python -c "import adversarial_market; print('OK')"

CI pipeline (GitHub Actions)

Job Python versions Steps
Lint 3.10 black · isort · flake8 (source) · flake8 (tests) · mypy
Test 3.10, 3.11 pip install -r requirements.txt · pip install -e . · pytest with coverage

Contributing

Contributions are welcome. Please read CONTRIBUTING.md before opening a pull request. Key points:

  • All new code must include tests maintaining the 222-test suite passing
  • All formatting and type checks must pass locally before pushing
  • For significant changes (new agents, new MI estimators, environment modifications), open an issue first to discuss the design

Citation

If this framework is useful in your research, please cite:

@software{adversarial_market_marl_2024,
  title   = {{Adversarial Market Microstructure}: A Multi-Agent Reinforcement Learning
             Framework for Covert Execution and Adaptive Signal Detection},
  year    = {2024},
  url     = {https://github.com/yourusername/adversarial-market-marl},
  license = {MIT}
}

License

Released under the MIT License — see LICENSE for full terms.


Built with PyTorch · Gymnasium · MINE · MAPPO

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adversarial_market_marl-0.1.0.tar.gz (82.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

adversarial_market_marl-0.1.0-py3-none-any.whl (74.0 kB view details)

Uploaded Python 3

File details

Details for the file adversarial_market_marl-0.1.0.tar.gz.

File metadata

  • Download URL: adversarial_market_marl-0.1.0.tar.gz
  • Upload date:
  • Size: 82.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for adversarial_market_marl-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8d534d70c50755c49587b7f7801beabbd27d8353093a6d9ecfaea095a6c1c6eb
MD5 7c8943149d908d10482c4efa958c2c63
BLAKE2b-256 26546146cd11cf9ce031e02765f1d5ad619dc3f69027f118df6fc70cfbd7872e

See more details on using hashes here.

Provenance

The following attestation bundles were made for adversarial_market_marl-0.1.0.tar.gz:

Publisher: release.yml on Nityahapani/adversarial-market-marl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file adversarial_market_marl-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for adversarial_market_marl-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cdef7b6e6f699ce6dbef028000dfecf5210089e2494f97ac6c4fe5e63ed09014
MD5 94b8fbd2d95f64928d23d536e49da189
BLAKE2b-256 d0625ff4263bc9331c7c34d476512bafbb6bcdcaa7f4d87c1985b35b421ecde9

See more details on using hashes here.

Provenance

The following attestation bundles were made for adversarial_market_marl-0.1.0-py3-none-any.whl:

Publisher: release.yml on Nityahapani/adversarial-market-marl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page