Polymarket crash-recovery labeled dataset (308 trades, 80.2% WR). Public ground truth for prediction-market mean-reversion research.

These details have not been verified by PyPI

Project links

Project description

cross-signal-data

The labeled Polymarket crash-recovery dataset behind a 79.8% win-rate live trading bot.

308 closed trades. Real Polymarket markets. Real entry triggers. Real outcomes. Public for anyone who wants to build their own mean-reversion bot, replicate our results, or prove us wrong.

What's in here

A single CSV (data/crashes_v1.csv) with one row per closed trade on Polymarket where the crash-recovery bot entered. Each row has:

The market (public Polymarket market_id and question text)
The signal (pre_crash_high, entry_price, drop_pct)
The outcome (exit_price, exit_reason, pnl_usd, is_profitable)
Time features (entry_hour_utc, entry_dow, hold_hours)

Stat	Value
Total trades	308
Profitable	247 (80.2%)
Date range	March 2026 – April 2026
Median hold	~3 hours
Avg drop_pct at entry	~22%
Avg recovered_to_pct_of_high	~85%

Exit reason	Count
RECOVERY (price came back)	235
TIMEOUT_48H (held 48h, exited)	62
TIMEOUT (early TIMEOUT exit)	11

Why this exists

Most prediction-market datasets are either:

Synthetic (generated for academic papers, no real money behind them), or
Aggregate (volume, liquidity at hourly resolution — useless for tactical signals)

This is neither. It's the actual labeled examples of a single specific signal — Polymarket markets that crashed N% from a recent high — paired with the actual outcome of trading the recovery. If you want to study whether mean-reversion works on prediction markets, this is the data.

Install

pip install cross-signal-data

Quick use (Python)

from cross_signal_data import load

df = load()
print(df.shape)              # (308, 19)
print(df.columns.tolist())   # full list of fields

# Filter to RECOVERY-only trades
recovered = df[df["exit_reason"] == "RECOVERY"]

# What entry-price bucket has the best win rate?
buckets = df.groupby(df["entry_price"].round(2)).agg(
    n=("trade_id", "count"),
    win_rate=("is_profitable", "mean"),
)
print(buckets)

If you don't have pandas:

from cross_signal_data import load
rows = load(as_pandas=False)  # list of dicts
print(len(rows), rows[0])

Quick use (any language)

The file is plain CSV. Just download it:

curl -o crashes_v1.csv https://raw.githubusercontent.com/LuciferForge/cross-signal-data/main/data/crashes_v1.csv

Schema

See docs/schema.md for full column-by-column documentation.

Key columns:

entry_price — the price-per-share when the bot entered (0–1)
pre_crash_high — the recent local-window high
drop_pct — (pre_crash_high − entry_price) / pre_crash_high × 100
exit_reason — RECOVERY, TIMEOUT_48H, TIMEOUT, or STOP
is_profitable — 1 if pnl_usd > 0 else 0
recovered_to_pct_of_high — exit_price / pre_crash_high × 100

Methodology

See docs/methodology.md for:

How the crash signal is defined
Entry/exit rules
Known biases (survivorship: only triggers that fired are recorded; a different threshold might surface different examples)
What's NOT in the data (slippage cost — see pnl-truthteller for the slippage layer)

Reproducibility

The script that generated this dataset is in scripts/extract.py. Anyone with the source positions.json from the bot can rerun it:

python scripts/extract.py \
    --positions /path/to/positions.json \
    --output data/crashes_v1.csv

Baseline notebook

notebooks/baseline_model.py trains a logistic regression and random forest on the dataset to predict is_profitable.

Result: ~79.9% cross-validated accuracy with simple features — essentially matching the bot's 80.2% WR. Translation: most of the alpha is in the entry trigger itself (which already filters to high-WR setups), not in further feature engineering. If you want to beat this dataset, you almost certainly need features the bot doesn't currently log (orderbook depth, market category, time-to-resolution).

Top feature importances from the random forest:

Feature	Importance
`drop_pct`	0.254
`shares`	0.200
`entry_price`	0.174
`pre_crash_high`	0.171
`entry_hour_utc`	0.110
`entry_dow`	0.059

A clean, exploitable insight from the diurnal column: win rate at hours 16, 21, 22 UTC reaches ~100% (small samples though); hour 8 UTC dips to ~55%. Off-peak hours are punishing. Adjust your live-firing schedule accordingly.

pip install cross-signal-data[ml]
python notebooks/baseline_model.py

Versioning

Version	Date	Trades	Notes
v1	2026-04-28	308	Initial public release

Future versions will add more trades, more features (orderbook depth at entry, market category, time-to-resolution) and possibly per-market metadata. Pin to a specific version if reproducibility matters: load(version="v1").

License

Code: MIT. Use the loader, the extraction script, and the baseline notebook however you want.

Data: MIT. Public on-chain prediction market data, transformed into a labeled dataset. Cite if you use it in research.

Citation

@dataset{cross_signal_data_2026,
    title  = {cross-signal-data: Polymarket crash-recovery labeled dataset},
    author = {LuciferForge},
    year   = {2026},
    url    = {https://github.com/LuciferForge/cross-signal-data}
}

About the author

Built by LuciferForge, running a public-audited Polymarket crash bot (308 closed trades, 80.2% WR, all data here). Also runs:

polymarket-mcp — MCP server for live Polymarket data
pnl-truthteller — slippage audit tool
polymarket-v2-migration — V1→V2 cookbook
protodex.io — public MCP-server index

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Apr 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cross_signal_data-0.1.0.tar.gz (29.8 kB view details)

Uploaded Apr 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cross_signal_data-0.1.0-py3-none-any.whl (26.0 kB view details)

Uploaded Apr 28, 2026 Python 3

File details

Details for the file cross_signal_data-0.1.0.tar.gz.

File metadata

Download URL: cross_signal_data-0.1.0.tar.gz
Upload date: Apr 28, 2026
Size: 29.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.2

File hashes

Hashes for cross_signal_data-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`aa147699e4064bad5afe2204365dbde64e3166971930c7d52fdb0d225b0e890d`
MD5	`e58f214529de93cc810254997d6664c6`
BLAKE2b-256	`9e1fdda550a939de0c0d72f464cd8c61d683ac5a1ae9bdfdfc0269dfada8344b`

See more details on using hashes here.

File details

Details for the file cross_signal_data-0.1.0-py3-none-any.whl.

File metadata

Download URL: cross_signal_data-0.1.0-py3-none-any.whl
Upload date: Apr 28, 2026
Size: 26.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.2

File hashes

Hashes for cross_signal_data-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`29e70e83733157add3568b8bec04ce38ceb33f30c423ce8ecda51be2f0e2f8dd`
MD5	`26237814252b2f70cc9c0d790cf46039`
BLAKE2b-256	`2e66b5b4a81088338cf33ec26911e404401ac6bd48714c28c553769896c197df`

See more details on using hashes here.

cross-signal-data 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

cross-signal-data

What's in here

Why this exists

Install

Quick use (Python)

Quick use (any language)

Schema

Methodology

Reproducibility

Baseline notebook

Versioning

License

Citation

About the author

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes