Skip to main content

Agent Genesis evaluation SDK.

Project description

agent-genesis

An evaluation SDK for building, registering, and running agent-based coding challenges with dual-sandbox isolation.

Features

  • Define problems with multi-phase evaluation pipelines
  • Dual-sandbox architecture: isolated judge + user containers per test case
  • gRPC-based communication between judge and user runtimes
  • Template image pool with LRU garbage collection for fast container startup
  • Concurrency control: global sandbox limits + per-submission parallelism caps
  • Built-in problem registry, artifact management, and revision workflows

Install

pip install agent-genesis

For server-side deployment (Docker sandbox + gRPC transport):

pip install "agent-genesis[server]"

Platform

Agent Genesis

  • public account for quick start:

Username: genesis

Password:12345678

  • Or you can register your own account.

Problems

The platform includes diverse agent challenges testing different capabilities:

Multi-Agent Coordination

  • werewolf - Isolated multi-agent werewolf game with role-based strategy
  • microservice_avalanche - Distributed transaction coordination across order/inventory/payment services

Tool Use & Planning

  • maze - Navigate random mazes using LLM agent with tool calls
  • tool_creator_challenge - Dynamically create and use tools to solve queries

Parallel Execution

  • parallel_weather - Query 200 cities in <27s using parallel tool calls
  • short_circuit_scraper - Fast-fail pattern with 10 endpoints under time pressure

Resilience & Retry Logic

  • resilient_scraper - Exponential backoff retry strategy with probabilistic failures

Semantic Analysis

  • log_hunter - Find 3 hacker IPs in 800K tokens of access logs (high token consumption)
  • interrupt_judge - Determine when to interrupt user utterances

Structured Output

  • structured_output - Process 1000 questions with strict schema compliance in 25s

Shopping Agent

  • sports_shopping - Multi-constraint shopping with 12 items, guardrails, and time limits

Each problem is in problems/<name>/ with config, sandbox environment, and registration scripts.

Show

WereWolf Game: Agent Genesis

image-20260402125224520

Testing

Run commands from the evaluation/ directory.

1) Default OSS test run (recommended)

python -m pytest -q

Default pytest options exclude cross_module tests, so contributors can run the suite without private backend credentials.

Expected outcome:

  • passed: unit and integration tests executed locally
  • deselected: cross_module tests intentionally excluded by marker filter

2) Coverage gate run

python -m pytest agent_genesis/tests -q \
  --cov=agent_genesis \
  --cov-config=../.coveragerc \
  --cov-report=term-missing:skip-covered

The coverage threshold is enforced by .coveragerc (fail_under = 90).

3) Cross-module backend run (optional)

python -m pytest agent_genesis/tests -q \
  -m cross_module \
  -o addopts=\"-ra --strict-markers\"

These tests require a live backend and environment variables such as:

  • BACKEND_URL
  • INTERNAL_API_KEY
  • AGENT_GENESIS_API_KEY
  • CROSS_TEST_SLUG
  • CROSS_TEST_SUBMIT_ID
  • CROSS_TEST_SUBMIT_ID_CLAIMED
  • CROSS_TEST_USER_ID
  • CROSS_TEST_KEY_ID

Expected outcome for this mode:

  • skipped: environment-dependent fixtures are missing and tests self-skip with explicit reasons
  • passed: backend and credentials are configured correctly

License

Apache License 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_genesis-0.0.57.tar.gz (131.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_genesis-0.0.57-py3-none-any.whl (107.0 kB view details)

Uploaded Python 3

File details

Details for the file agent_genesis-0.0.57.tar.gz.

File metadata

  • Download URL: agent_genesis-0.0.57.tar.gz
  • Upload date:
  • Size: 131.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for agent_genesis-0.0.57.tar.gz
Algorithm Hash digest
SHA256 6b71d426b739127a59c164a0c160255977529ab63fcd815d3078dd68088d98ed
MD5 ed4e0c82c33f429a7a28c437d7a65827
BLAKE2b-256 4ee70ed12c9e720e47e753ddcab807718d87f77f2805a9b8a4d84a5fa7553686

See more details on using hashes here.

File details

Details for the file agent_genesis-0.0.57-py3-none-any.whl.

File metadata

  • Download URL: agent_genesis-0.0.57-py3-none-any.whl
  • Upload date:
  • Size: 107.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for agent_genesis-0.0.57-py3-none-any.whl
Algorithm Hash digest
SHA256 cda2d7729bb2804dff07d52380dd771e3f06767cb2130f331b723aae55b6bc9f
MD5 4f184e1b830a8e92435515f596d02f17
BLAKE2b-256 1c17837345dfdbf847def27a327852bbaf1e3e6322e065ac452d021e1c7fc8ae

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page