Skip to main content

Agent Genesis evaluation SDK.

Project description

agent-genesis

An evaluation SDK for building, registering, and running agent-based coding challenges with dual-sandbox isolation.

Features

  • Define problems with multi-phase evaluation pipelines
  • Dual-sandbox architecture: isolated judge + user containers per test case
  • gRPC-based communication between judge and user runtimes
  • Template image pool with LRU garbage collection for fast container startup
  • Concurrency control: global sandbox limits + per-submission parallelism caps
  • Built-in problem registry, artifact management, and revision workflows

Install

pip install agent-genesis

For server-side deployment (Docker sandbox + gRPC transport):

pip install "agent-genesis[server]"

Platform

Agent Genesis

  • public account for quick start:

Username: genesis

Password:12345678

  • Or you can register your own account.

Problems

The platform includes diverse agent challenges testing different capabilities:

Multi-Agent Coordination

  • werewolf - Isolated multi-agent werewolf game with role-based strategy
  • microservice_avalanche - Distributed transaction coordination across order/inventory/payment services

Tool Use & Planning

  • maze - Navigate random mazes using LLM agent with tool calls
  • tool_creator_challenge - Dynamically create and use tools to solve queries

Parallel Execution

  • parallel_weather - Query 200 cities in <27s using parallel tool calls
  • short_circuit_scraper - Fast-fail pattern with 10 endpoints under time pressure

Resilience & Retry Logic

  • resilient_scraper - Exponential backoff retry strategy with probabilistic failures

Semantic Analysis

  • log_hunter - Find 3 hacker IPs in 800K tokens of access logs (high token consumption)
  • interrupt_judge - Determine when to interrupt user utterances

Structured Output

  • structured_output - Process 1000 questions with strict schema compliance in 25s

Shopping Agent

  • sports_shopping - Multi-constraint shopping with 12 items, guardrails, and time limits

Each problem is in problems/<name>/ with config, sandbox environment, and registration scripts.

Show

WereWolf Game: Agent Genesis

image-20260402125224520

Testing

Run commands from the evaluation/ directory.

1) Default OSS test run (recommended)

python -m pytest -q

Default pytest options exclude cross_module tests, so contributors can run the suite without private backend credentials.

Expected outcome:

  • passed: unit and integration tests executed locally
  • deselected: cross_module tests intentionally excluded by marker filter

2) Coverage gate run

python -m pytest agent_genesis/tests -q \
  --cov=agent_genesis \
  --cov-config=../.coveragerc \
  --cov-report=term-missing:skip-covered

The coverage threshold is enforced by .coveragerc (fail_under = 90).

3) Cross-module backend run (optional)

python -m pytest agent_genesis/tests -q \
  -m cross_module \
  -o addopts=\"-ra --strict-markers\"

These tests require a live backend and environment variables such as:

  • BACKEND_URL
  • INTERNAL_API_KEY
  • AGENT_GENESIS_API_KEY
  • CROSS_TEST_SLUG
  • CROSS_TEST_SUBMIT_ID
  • CROSS_TEST_SUBMIT_ID_CLAIMED
  • CROSS_TEST_USER_ID
  • CROSS_TEST_KEY_ID

Expected outcome for this mode:

  • skipped: environment-dependent fixtures are missing and tests self-skip with explicit reasons
  • passed: backend and credentials are configured correctly

License

Apache License 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_genesis-0.0.53.tar.gz (131.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_genesis-0.0.53-py3-none-any.whl (106.4 kB view details)

Uploaded Python 3

File details

Details for the file agent_genesis-0.0.53.tar.gz.

File metadata

  • Download URL: agent_genesis-0.0.53.tar.gz
  • Upload date:
  • Size: 131.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for agent_genesis-0.0.53.tar.gz
Algorithm Hash digest
SHA256 7bccf8f6c1e1ff6c43d370edc1a3fd6694e2a2595cd7edd14de834246441aeae
MD5 954a08f99692805e10acdc4a9fd05ec9
BLAKE2b-256 c49cca9a52110139f9147ecd5426111d45d86cdf5884d4dd1b674b0b840874bf

See more details on using hashes here.

File details

Details for the file agent_genesis-0.0.53-py3-none-any.whl.

File metadata

  • Download URL: agent_genesis-0.0.53-py3-none-any.whl
  • Upload date:
  • Size: 106.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for agent_genesis-0.0.53-py3-none-any.whl
Algorithm Hash digest
SHA256 7932d02f9bbfe3d0a26fd08cc26f7d81452017bb05e63456a6ba29c94c000726
MD5 b8afe33e03f76b34a965268406f28a19
BLAKE2b-256 3e6df2b49be4802285aecf0d6b9d0740b47ffdac4d6fd9514f9362e1d5cdcca9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page