Auditable benchmark for LLM trading agents with realistic execution, risk gates, and replayable trajectories.
Project description
Open-source benchmark and audit framework for evaluating LLM trading agents under realistic execution, risk, and replayability constraints.
Getting started | Project site | Benchmark card | Submit results | Demo matrix | Contribute | Security
TradeArena
TradeArena turns every trading-agent decision into a traceable trajectory:
observation -> signal -> intended allocation -> risk gate -> order
-> fill/rejection -> portfolio state -> diagnostic report
It is not another "LLM trading bot." It is a framework for asking whether an LLM trading agent can be audited, reproduced, stress-tested, and constrained before anyone trusts its headline return.
Quick Start
python -m pip install -e ".[dev]"
python scripts/run_showcase.py
Open:
outputs/examples/index.html
The first-run path uses deterministic agents, tracked snapshots, and local demo artifacts. It does not call DeepSeek, Poe, OpenAI, Hugging Face, AkShare, Yahoo Finance, or broker APIs unless you opt into advanced experiments.
No local install yet?
Install And Run
From a clone:
python -m pip install -e ".[dev]"
tradearena --benchmark tradearena-core
python -m tradearena.cli --benchmark tradearena-core
From GitHub without cloning first:
python -m pip install "git+https://github.com/weich97/TradeArena.git"
tradearena --benchmark tradearena-core
PyPI package status: the tradearena distribution name is already occupied by
an unrelated project, so the release distribution is tradearena-benchmark.
Install from PyPI with:
python -m pip install tradearena-benchmark
tradearena --benchmark tradearena-core
The import namespace and CLI remain tradearena. The historical implementation
package trading_agent_os remains available as a compatibility namespace.
Benchmark Result
The v0.1 benchmark card makes one compact claim:
LLM trading-agent evaluation changes materially once intended allocations pass through auditable risk gates and realistic execution constraints.
Open:
- Static page:
weich97.github.io/TradeArena/benchmark-v0.1.html - Markdown artifact:
docs/results/benchmark_v0_1.md - Community registry:
docs/results/community_registry.md
Rebuild:
python scripts/build_benchmark_page.py
python scripts/build_benchmark_registry.py examples/benchmark_submissions
Submit Or Validate A Benchmark Row
TradeArena supports redacted benchmark submissions. They share scenario, execution, risk, metrics, and reproducibility metadata without exposing raw provider prompts, responses, credentials, or private portfolios.
tradearena validate-submission examples/benchmark_submissions/example_redacted_submission.json
tradearena build-registry examples/benchmark_submissions --output docs/results/community_registry.md
tradearena hash-run outputs/examples/audit_walkthrough_trajectory.json
See docs/benchmark_submissions.md.
Visual Preview
| Audit lifecycle | Execution realism | Diagnostic loop |
|---|---|---|
|
|
|
|
The browser-playable launch video is here:
weich97.github.io/TradeArena/demo_video.html.
What TradeArena Provides
| Need | TradeArena surface |
|---|---|
| Replayable decisions | Trajectory logs with prompts, memory digests, risk reports, fills, and metrics |
| Execution realism | Fees, spread, slippage, latency, liquidity caps, partial fills, and rejections |
| Risk-aware evaluation | Pre-trade gates, in-trade monitors, post-trade attribution, violations |
| Extensibility | Data, analyst, strategy, risk, simulator, memory, planner, evaluator plugins |
| Community benchmarks | Redacted submission schema, registry builder, reproducibility hashes |
Extension Path
Start with one small plugin:
python examples/custom_plugin_demo.py
python examples/extension_walkthrough_demo.py
The walkthrough swaps in a custom analyst, risk manager, and evaluator while reusing the existing runner, data provider, strategy, execution simulator, memory store, trajectory logger, and metric stack.
Useful entry points:
Documentation Map
- Quickstart:
docs/getting_started.md - Schemas:
docs/schemas.md - Benchmark submissions:
docs/benchmark_submissions.md - Related work:
docs/related_work.md - Retail planning sandbox:
docs/retail_planning.md - Research protocol:
docs/research_protocol.md - Security policy:
SECURITY.md - Governance:
GOVERNANCE.md
Local Checks
Each checkout can use its own .venv, so public and private repos do not
fight over editable installs:
powershell -ExecutionPolicy Bypass -File scripts\check_local.ps1
The script installs the current checkout in editable mode, runs compile checks, Ruff critical checks, tests, release-readiness checks, submission validation, artifact-contract validation, and JSON validation.
Safety Boundary
TradeArena does not promise profitable trading, does not provide financial
advice, and does not execute live trades by default. Public examples are
offline, paper-only, or human-review oriented. Broker and provider integrations
must follow SECURITY.md and GOVERNANCE.md.
Cite
See CITATION.cff. If you use TradeArena in research or
software, cite the repository release you used.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tradearena_benchmark-0.1.2.tar.gz.
File metadata
- Download URL: tradearena_benchmark-0.1.2.tar.gz
- Upload date:
- Size: 91.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9fc40a8d01f19d6ba2d315d17a24e8612a4e462a1ea5bf10bc758e0aad5d157a
|
|
| MD5 |
34f7cb7b78915fb9bd8a335cffa7bff1
|
|
| BLAKE2b-256 |
10b0cd4012c3e6ed7f50b28ad416372b78db5d467613507643acd8383cbfd572
|
Provenance
The following attestation bundles were made for tradearena_benchmark-0.1.2.tar.gz:
Publisher:
release.yml on weich97/TradeArena
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tradearena_benchmark-0.1.2.tar.gz -
Subject digest:
9fc40a8d01f19d6ba2d315d17a24e8612a4e462a1ea5bf10bc758e0aad5d157a - Sigstore transparency entry: 1561892565
- Sigstore integration time:
-
Permalink:
weich97/TradeArena@2b52bc1eda701a64211bf4f1d958e652ce865647 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/weich97
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@2b52bc1eda701a64211bf4f1d958e652ce865647 -
Trigger Event:
push
-
Statement type:
File details
Details for the file tradearena_benchmark-0.1.2-py3-none-any.whl.
File metadata
- Download URL: tradearena_benchmark-0.1.2-py3-none-any.whl
- Upload date:
- Size: 96.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cf627805e15e9021c8597399b7f6456b5e0f081028aadb439e2941ab0830d954
|
|
| MD5 |
17d8fd828317f8a1fbd3808eb42e95a0
|
|
| BLAKE2b-256 |
da8dfbe3ef72f441c92bcac04cb2aff10520431a6a19baa23971c06b091a0125
|
Provenance
The following attestation bundles were made for tradearena_benchmark-0.1.2-py3-none-any.whl:
Publisher:
release.yml on weich97/TradeArena
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tradearena_benchmark-0.1.2-py3-none-any.whl -
Subject digest:
cf627805e15e9021c8597399b7f6456b5e0f081028aadb439e2941ab0830d954 - Sigstore transparency entry: 1561893627
- Sigstore integration time:
-
Permalink:
weich97/TradeArena@2b52bc1eda701a64211bf4f1d958e652ce865647 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/weich97
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@2b52bc1eda701a64211bf4f1d958e652ce865647 -
Trigger Event:
push
-
Statement type: