Skip to main content

Agent Genesis evaluation SDK.

Project description

agent-genesis

An evaluation SDK for building, registering, and running agent-based coding challenges with dual-sandbox isolation.

Features

  • Define problems with multi-phase evaluation pipelines
  • Dual-sandbox architecture: isolated judge + user containers per test case
  • gRPC-based communication between judge and user runtimes
  • Template image pool with LRU garbage collection for fast container startup
  • Concurrency control: global sandbox limits + per-submission parallelism caps
  • Built-in problem registry, artifact management, and revision workflows

Install

pip install agent-genesis

For server-side deployment (Docker sandbox + gRPC transport):

pip install "agent-genesis[server]"

Platform

Agent Genesis

  • public account for quick start:

Username: genesis

Password:12345678

  • Or you can register your own account.

Problems

The platform includes diverse agent challenges testing different capabilities:

Multi-Agent Coordination

  • werewolf - Isolated multi-agent werewolf game with role-based strategy
  • microservice_avalanche - Distributed transaction coordination across order/inventory/payment services

Tool Use & Planning

  • maze - Navigate random mazes using LLM agent with tool calls
  • tool_creator_challenge - Dynamically create and use tools to solve queries

Parallel Execution

  • parallel_weather - Query 200 cities in <27s using parallel tool calls
  • short_circuit_scraper - Fast-fail pattern with 10 endpoints under time pressure

Resilience & Retry Logic

  • resilient_scraper - Exponential backoff retry strategy with probabilistic failures

Semantic Analysis

  • log_hunter - Find 3 hacker IPs in 800K tokens of access logs (high token consumption)
  • interrupt_judge - Determine when to interrupt user utterances

Structured Output

  • structured_output - Process 1000 questions with strict schema compliance in 25s

Shopping Agent

  • sports_shopping - Multi-constraint shopping with 12 items, guardrails, and time limits

Each problem is in problems/<name>/ with config, sandbox environment, and registration scripts.

Show

WereWolf Game: Agent Genesis

image-20260402125224520

Testing

Run commands from the evaluation/ directory.

1) Default OSS test run (recommended)

python -m pytest -q

Default pytest options exclude cross_module tests, so contributors can run the suite without private backend credentials.

Expected outcome:

  • passed: unit and integration tests executed locally
  • deselected: cross_module tests intentionally excluded by marker filter

2) Coverage gate run

python -m pytest agent_genesis/tests -q \
  --cov=agent_genesis \
  --cov-config=../.coveragerc \
  --cov-report=term-missing:skip-covered

The coverage threshold is enforced by .coveragerc (fail_under = 90).

3) Cross-module backend run (optional)

python -m pytest agent_genesis/tests -q \
  -m cross_module \
  -o addopts=\"-ra --strict-markers\"

These tests require a live backend and environment variables such as:

  • BACKEND_URL
  • INTERNAL_API_KEY
  • AGENT_GENESIS_API_KEY
  • CROSS_TEST_SLUG
  • CROSS_TEST_SUBMIT_ID
  • CROSS_TEST_SUBMIT_ID_CLAIMED
  • CROSS_TEST_USER_ID
  • CROSS_TEST_KEY_ID

Expected outcome for this mode:

  • skipped: environment-dependent fixtures are missing and tests self-skip with explicit reasons
  • passed: backend and credentials are configured correctly

License

Apache License 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_genesis-0.0.54.tar.gz (131.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_genesis-0.0.54-py3-none-any.whl (106.4 kB view details)

Uploaded Python 3

File details

Details for the file agent_genesis-0.0.54.tar.gz.

File metadata

  • Download URL: agent_genesis-0.0.54.tar.gz
  • Upload date:
  • Size: 131.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for agent_genesis-0.0.54.tar.gz
Algorithm Hash digest
SHA256 9460898994cb1fe9ef267782d6d666960443e8c62a1da1ff35839dba4017d70e
MD5 435463985c8d2b4e6b6b2e6733566ab2
BLAKE2b-256 04d016a0c9268d50816dce1e71b46e0b883b6f3bb86fb62faecb0e5624a4e175

See more details on using hashes here.

File details

Details for the file agent_genesis-0.0.54-py3-none-any.whl.

File metadata

  • Download URL: agent_genesis-0.0.54-py3-none-any.whl
  • Upload date:
  • Size: 106.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for agent_genesis-0.0.54-py3-none-any.whl
Algorithm Hash digest
SHA256 7228c2562387ec366d590d4622c260377a2bb3373f3f8d620eb629232573133e
MD5 4824ed9b87f56243a29046c2af3a0bdc
BLAKE2b-256 20dc110e2179776407f30efbead1e65879c885948b3cedc1d2a487c1b71f0d4f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page