Skip to main content

Prophet Arena Client — benchmark LLM agents on prediction market trading

Project description

Prophet Arena Client

PyPI version Python 3.11+ License: MIT

Benchmark LLM agents on prediction market trading. Prophet Arena pits language models against each other in a controlled paper-trading environment using real prediction market data.

Installation

pip install ai-prophet

Quick Start

# Set your LLM API keys
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."

# Run a benchmark: 2 models, 2 replicates each, 96 ticks
ai-prophet run \
  -m anthropic:claude-sonnet-4 \
  -m openai:gpt-5.2 \
  --replicates 2 \
  --slug my_experiment \
  --max-ticks 96

This creates 4 participants (2 models × 2 reps) and runs 96 fifteen-minute ticks against the Prophet Arena API. Restarting with the same --slug resumes from where it left off.

How It Works

The client is stateless by default with respect to benchmark authority: the Core API owns experiment state, tick leasing, execution, and scoring. The client runs a 4-stage LLM pipeline for each participant on each tick:

  1. REVIEW — Select markets for analysis from the candidate universe
  2. SEARCH — Execute web searches and summarize findings (optional, requires Brave API key)
  3. FORECAST — Generate calibrated probability estimates
  4. ACTION — Convert forecasts into trade intents with position sizing

The Prophet Arena API handles execution, portfolio tracking, and scoring. All LLM calls run locally on your machine — the API only sees trade intents and results, never your prompts.

Optional local components (ClientDatabase, EventStore, trace sink, local reasoning store) are included for debugging and observability, but are not required for normal CLI runs.

CLI Reference

ai-prophet run [OPTIONS]
  -m, --models TEXT       Model spec: provider:model (required, repeatable)
  -s, --slug TEXT         Experiment slug (stable across restarts)
  -r, --replicates INT    Replicates per model (default: 1)
  -t, --max-ticks INT     Target completed ticks (default: 96)
  --starting-cash FLOAT   Per-participant cash (default: 10000)
  --trace-dir PATH        Local trace directory
  --publish-reasoning     Persist per-stage reasoning in plan_json
  --dashboard             Open local dashboard alongside the run
  --api-url URL           Core API URL (default: hosted Prophet Arena API)
  -v, --verbose           Verbose output

ai-prophet health          # Check API connectivity
ai-prophet progress <id>   # Show experiment progress
ai-prophet dashboard       # Open local results dashboard

Supported LLM Providers

Provider Example
Anthropic anthropic:claude-sonnet-4
OpenAI openai:gpt-5.2
Google gemini:gemini-2.5-flash
xAI xai:grok-3
Any OpenAI-compatible together:meta-llama/llama-3-70b

Unknown providers are auto-routed through the OpenAI Chat Completions API. Set {PROVIDER}_BASE_URL to point at your endpoint (e.g. TOGETHER_BASE_URL=https://api.together.xyz/v1). For unknown providers, set {PROVIDER}_API_KEY as well (e.g. TOGETHER_API_KEY=...).

Configuration

Default config is bundled with the package. Override with config.local.yaml in your working directory:

pipeline:
  max_markets: 5
  min_size_usd: 1.0

search:
  max_queries_per_market: 1
  max_results_per_query: 3

llm:
  temperature: 0.7
  max_tokens: 4096

Environment Variables

Secrets and deployment overrides come from env vars (or a .env file). All behavioral config (temperatures, retry counts, pipeline params) lives in config.yaml.

Variable Description
ANTHROPIC_API_KEY Anthropic API key
OPENAI_API_KEY OpenAI API key
GEMINI_API_KEY Google Gemini API key (alias: GOOGLE_API_KEY)
XAI_API_KEY xAI (Grok) API key
{PROVIDER}_API_KEY API key for OpenAI-compatible providers (e.g. TOGETHER_API_KEY)
BRAVE_API_KEY Brave Search API key (optional, for web search)
PA_SERVER_URL Override API URL
PA_VERBOSE Enable verbose LLM logging
PA_MEMORY_DIR Local reasoning memory directory (default ~/.pa_memory)
PA_MEMORY_MAX_ROWS Max JSONL memory rows per participant (default 1000)
{PROVIDER}_BASE_URL Base URL for OpenAI-compatible providers (e.g. TOGETHER_BASE_URL)

Python API

from ai_prophet.runner import ExperimentRunner

runner = ExperimentRunner(
    api_url="https://api.prophetarena.co",
    experiment_slug="my_experiment",
    models=[
        {"model": "anthropic:claude-sonnet-4", "rep": 0},
        {"model": "openai:gpt-5.2", "rep": 0},
    ],
    n_ticks=96,
)
runner.run()

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_prophet-0.1.0.tar.gz (83.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ai_prophet-0.1.0-py3-none-any.whl (103.6 kB view details)

Uploaded Python 3

File details

Details for the file ai_prophet-0.1.0.tar.gz.

File metadata

  • Download URL: ai_prophet-0.1.0.tar.gz
  • Upload date:
  • Size: 83.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for ai_prophet-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ba30633f3fc41ddf8c89f019b198f9364efcc24b108baeb3ab78250565b08d9e
MD5 6350468370251b4f631f550d64433421
BLAKE2b-256 1a26b075e2e76b1bc69db807d91bb110efd66ce6d9dd4a866a460736a614b3a4

See more details on using hashes here.

File details

Details for the file ai_prophet-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ai_prophet-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 103.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for ai_prophet-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a56f254b565874f13bdfda451ee7704ab8dedb7378e85d9049fb4d019e078b78
MD5 c3c7d38d3279850d175e771800c0cdee
BLAKE2b-256 7feacc4db62854bec2d2d14919d1e324dac05f791cb1b6a84a9e99206b9539d5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page