Skip to main content

Auditable benchmark for LLM trading agents with realistic execution, risk gates, and replayable trajectories.

Project description

TradeArena wordmark

Open-source benchmark and audit framework for evaluating LLM trading agents under realistic execution, risk, and replayability constraints.

CI CodeQL Release Python License

Getting started | Project site | Benchmark card | Submit results | Demo matrix | Contribute | Security

TradeArena

TradeArena turns every trading-agent decision into a traceable trajectory:

observation -> signal -> intended allocation -> risk gate -> order
  -> fill/rejection -> portfolio state -> diagnostic report

It is not another "LLM trading bot." It is a framework for asking whether an LLM trading agent can be audited, reproduced, stress-tested, and constrained before anyone trusts its headline return.

Quick Start

python -m pip install -e ".[dev]"
python scripts/run_showcase.py

Open:

outputs/examples/index.html

The first-run path uses deterministic agents, tracked snapshots, and local demo artifacts. It does not call DeepSeek, Poe, OpenAI, Hugging Face, AkShare, Yahoo Finance, or broker APIs unless you opt into advanced experiments.

No local install yet?

Open in GitHub Codespaces Open in Colab

Install And Run

From a clone:

python -m pip install -e ".[dev]"
tradearena --benchmark tradearena-core
python -m tradearena.cli --benchmark tradearena-core

From GitHub without cloning first:

python -m pip install "git+https://github.com/weich97/TradeArena.git"
tradearena --benchmark tradearena-core

PyPI package status: the tradearena distribution name is already occupied by an unrelated project, so the release distribution is tradearena-benchmark. Install from PyPI with:

python -m pip install tradearena-benchmark
tradearena --benchmark tradearena-core

The import namespace and CLI remain tradearena. The historical implementation package trading_agent_os remains available as a compatibility namespace.

Benchmark Result

The v0.1 benchmark card makes one compact claim:

LLM trading-agent evaluation changes materially once intended allocations pass through auditable risk gates and realistic execution constraints.

Open:

Rebuild:

python scripts/build_benchmark_page.py
python scripts/build_benchmark_registry.py examples/benchmark_submissions

Submit Or Validate A Benchmark Row

TradeArena supports redacted benchmark submissions. They share scenario, execution, risk, metrics, and reproducibility metadata without exposing raw provider prompts, responses, credentials, or private portfolios.

tradearena validate-submission examples/benchmark_submissions/example_redacted_submission.json
tradearena build-registry examples/benchmark_submissions --output docs/results/community_registry.md
tradearena hash-run outputs/examples/audit_walkthrough_trajectory.json

See docs/benchmark_submissions.md.

Visual Preview

Audit lifecycle Execution realism Diagnostic loop
Animated observe-plan-risk-execute-reflect audit trace Animated execution comparison of ideal, realistic, high-spread,
                low-liquidity, and high-latency fills Animated representation, risk-feedback, and concentration diagnostics

The browser-playable launch video is here: weich97.github.io/TradeArena/demo_video.html.

What TradeArena Provides

Need TradeArena surface
Replayable decisions Trajectory logs with prompts, memory digests, risk reports, fills, and metrics
Execution realism Fees, spread, slippage, latency, liquidity caps, partial fills, and rejections
Risk-aware evaluation Pre-trade gates, in-trade monitors, post-trade attribution, violations
Extensibility Data, analyst, strategy, risk, simulator, memory, planner, evaluator plugins
Community benchmarks Redacted submission schema, registry builder, reproducibility hashes

Extension Path

Start with one small plugin:

python examples/custom_plugin_demo.py
python examples/extension_walkthrough_demo.py

The walkthrough swaps in a custom analyst, risk manager, and evaluator while reusing the existing runner, data provider, strategy, execution simulator, memory store, trajectory logger, and metric stack.

Useful entry points:

Documentation Map

Local Checks

Each checkout can use its own .venv, so public and private repos do not fight over editable installs:

powershell -ExecutionPolicy Bypass -File scripts\check_local.ps1

The script installs the current checkout in editable mode, runs compile checks, Ruff critical checks, tests, release-readiness checks, submission validation, artifact-contract validation, and JSON validation.

Safety Boundary

TradeArena does not promise profitable trading, does not provide financial advice, and does not execute live trades by default. Public examples are offline, paper-only, or human-review oriented. Broker and provider integrations must follow SECURITY.md and GOVERNANCE.md.

Cite

See CITATION.cff. If you use TradeArena in research or software, cite the repository release you used.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tradearena_benchmark-0.1.2.tar.gz (91.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tradearena_benchmark-0.1.2-py3-none-any.whl (96.5 kB view details)

Uploaded Python 3

File details

Details for the file tradearena_benchmark-0.1.2.tar.gz.

File metadata

  • Download URL: tradearena_benchmark-0.1.2.tar.gz
  • Upload date:
  • Size: 91.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tradearena_benchmark-0.1.2.tar.gz
Algorithm Hash digest
SHA256 9fc40a8d01f19d6ba2d315d17a24e8612a4e462a1ea5bf10bc758e0aad5d157a
MD5 34f7cb7b78915fb9bd8a335cffa7bff1
BLAKE2b-256 10b0cd4012c3e6ed7f50b28ad416372b78db5d467613507643acd8383cbfd572

See more details on using hashes here.

Provenance

The following attestation bundles were made for tradearena_benchmark-0.1.2.tar.gz:

Publisher: release.yml on weich97/TradeArena

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tradearena_benchmark-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for tradearena_benchmark-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 cf627805e15e9021c8597399b7f6456b5e0f081028aadb439e2941ab0830d954
MD5 17d8fd828317f8a1fbd3808eb42e95a0
BLAKE2b-256 da8dfbe3ef72f441c92bcac04cb2aff10520431a6a19baa23971c06b091a0125

See more details on using hashes here.

Provenance

The following attestation bundles were made for tradearena_benchmark-0.1.2-py3-none-any.whl:

Publisher: release.yml on weich97/TradeArena

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page