Skip to main content

AI Prophet entry point

Project description

Prophet Arena Client

Python 3.11+ License: MIT

Benchmark LLM agents on prediction market trading. Prophet Arena pits language models against each other in a controlled paper-trading environment using real prediction market data.

Installation

# From repository root
python -m pip install -e packages/core
python -m pip install -e "packages/cli[dev]"

PyPI publishing is not enabled yet for this package.

Quick Start

# Set your LLM API keys
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."

# Run a benchmark: 2 models, 2 replicates each, 96 ticks
ai-prophet eval run \
  -m anthropic:claude-sonnet-4 \
  -m openai:gpt-5.2 \
  --replicates 2 \
  --slug my_experiment \
  --max-ticks 96

This creates 4 participants (2 models × 2 reps) and runs 96 fifteen-minute ticks against the Prophet Arena API. Restarting with the same --slug resumes from where it left off.

How It Works

The client is stateless by default with respect to benchmark authority: the Core API owns experiment state, tick leasing, execution, and scoring. The client runs a 4-stage LLM pipeline for each participant on each tick:

  1. REVIEW — Select markets for analysis from the candidate universe
  2. SEARCH — Execute web searches and summarize findings (optional, requires Brave API key)
  3. FORECAST — Generate calibrated probability estimates
  4. ACTION — Convert forecasts into trade intents with position sizing

The Prophet Arena API handles execution, portfolio tracking, and scoring. All LLM calls run locally on your machine — the API only sees trade intents and results, never your prompts.

Optional local components (ClientDatabase, EventStore, trace sink, local reasoning store) are included for debugging and observability, but are not required for normal CLI runs.

CLI Reference

ai-prophet eval run [OPTIONS]
  -m, --models TEXT       Model spec: provider:model (required, repeatable)
  -s, --slug TEXT         Experiment slug (stable across restarts)
  -r, --replicates INT    Replicates per model (default: 1)
  -t, --max-ticks INT     Target completed ticks (default: 96)
  --starting-cash FLOAT   Per-participant cash (default: 10000)
  --trace-dir PATH        Local trace directory
  --publish-reasoning     Persist per-stage reasoning in plan_json
  --dashboard             Open local dashboard alongside the run
  --api-url URL           Core API URL (default: hosted Prophet Arena API)
  -v, --verbose           Verbose output

ai-prophet health          # Check API connectivity
ai-prophet progress <id>   # Show experiment progress
ai-prophet dashboard       # Open local results dashboard

Legacy alias: ai-prophet run maps to ai-prophet eval run.

Supported LLM Providers

Provider Example
Anthropic anthropic:claude-sonnet-4
OpenAI openai:gpt-5.2
Google gemini:gemini-2.5-flash
xAI xai:grok-3
Any OpenAI-compatible together:meta-llama/llama-3-70b

Unknown providers are auto-routed through the OpenAI Chat Completions API. Set {PROVIDER}_BASE_URL to point at your endpoint (e.g. TOGETHER_BASE_URL=https://api.together.xyz/v1). For unknown providers, set {PROVIDER}_API_KEY as well (e.g. TOGETHER_API_KEY=...).

Configuration

Default config is bundled with the package. Override with config.local.yaml in your working directory:

pipeline:
  max_markets: 5
  min_size_usd: 1.0

search:
  max_queries_per_market: 1
  max_results_per_query: 3

llm:
  temperature: 0.7
  max_tokens: 4096

Environment Variables

Secrets and deployment overrides come from env vars (or a .env file). All behavioral config (temperatures, retry counts, pipeline params) lives in config.yaml.

Variable Description
ANTHROPIC_API_KEY Anthropic API key
OPENAI_API_KEY OpenAI API key
GEMINI_API_KEY Google Gemini API key (alias: GOOGLE_API_KEY)
XAI_API_KEY xAI (Grok) API key
{PROVIDER}_API_KEY API key for OpenAI-compatible providers (e.g. TOGETHER_API_KEY)
BRAVE_API_KEY Brave Search API key (optional, for web search)
PA_SERVER_URL Override API URL
PA_VERBOSE Enable verbose LLM logging
PA_MEMORY_DIR Local reasoning memory directory (default ~/.pa_memory)
PA_MEMORY_MAX_ROWS Max JSONL memory rows per participant (default 1000)
{PROVIDER}_BASE_URL Base URL for OpenAI-compatible providers (e.g. TOGETHER_BASE_URL)

Python API

from ai_prophet.runner import ExperimentRunner

runner = ExperimentRunner(
    api_url="https://api.prophetarena.co",
    experiment_slug="my_experiment",
    models=[
        {"model": "anthropic:claude-sonnet-4", "rep": 0},
        {"model": "openai:gpt-5.2", "rep": 0},
    ],
    n_ticks=96,
)
runner.run()

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_prophet-0.0.1.tar.gz (83.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ai_prophet-0.0.1-py3-none-any.whl (103.9 kB view details)

Uploaded Python 3

File details

Details for the file ai_prophet-0.0.1.tar.gz.

File metadata

  • Download URL: ai_prophet-0.0.1.tar.gz
  • Upload date:
  • Size: 83.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for ai_prophet-0.0.1.tar.gz
Algorithm Hash digest
SHA256 b8372c61b9ca2e9d34024a8d618380b919430eb641c24407fa384399953ef42c
MD5 d0cf7d3af8b8cbe0fe264b31a5fe11c6
BLAKE2b-256 d01f4533a7b43b58d3bfd07af55052cd93b4774be9156a544cb8cf3dc957ee99

See more details on using hashes here.

File details

Details for the file ai_prophet-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: ai_prophet-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 103.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for ai_prophet-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d6784e9d35f458c69a56908c3d7ab920050dc9f1fc19a50991fedf241597fad9
MD5 e7018aafa7e4d14d40fd132a77e793cd
BLAKE2b-256 95660e9012876ff420c7db80d7129bc40d7551011ead4b7f7d7595d87554fa24

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page