Skip to main content

Agent Genesis evaluation SDK.

Project description

agent-genesis

An evaluation SDK for building, registering, and running agent-based coding challenges with dual-sandbox isolation.

Features

  • Define problems with multi-phase evaluation pipelines
  • Dual-sandbox architecture: isolated judge + user containers per test case
  • gRPC-based communication between judge and user runtimes
  • Template image pool with LRU garbage collection for fast container startup
  • Concurrency control: global sandbox limits + per-submission parallelism caps
  • Built-in problem registry, artifact management, and revision workflows

Install

pip install agent-genesis

For server-side deployment (Docker sandbox + gRPC transport):

pip install "agent-genesis[server]"

Platform

Agent Genesis

  • public account for quick start:

Username: genesis

Password:12345678

  • Or you can register your own account.

Problems

The platform includes diverse agent challenges testing different capabilities:

Multi-Agent Coordination

  • werewolf - Isolated multi-agent werewolf game with role-based strategy
  • microservice_avalanche - Distributed transaction coordination across order/inventory/payment services

Tool Use & Planning

  • maze - Navigate random mazes using LLM agent with tool calls
  • tool_creator_challenge - Dynamically create and use tools to solve queries

Parallel Execution

  • parallel_weather - Query 200 cities in <27s using parallel tool calls
  • short_circuit_scraper - Fast-fail pattern with 10 endpoints under time pressure

Resilience & Retry Logic

  • resilient_scraper - Exponential backoff retry strategy with probabilistic failures

Semantic Analysis

  • log_hunter - Find 3 hacker IPs in 800K tokens of access logs (high token consumption)
  • interrupt_judge - Determine when to interrupt user utterances

Structured Output

  • structured_output - Process 1000 questions with strict schema compliance in 25s

Shopping Agent

  • sports_shopping - Multi-constraint shopping with 12 items, guardrails, and time limits

Each problem is in problems/<name>/ with config, sandbox environment, and registration scripts.

Show

WereWolf Game: Agent Genesis

image-20260402125224520

Testing

Run commands from the evaluation/ directory.

1) Default OSS test run (recommended)

python -m pytest -q

Default pytest options exclude cross_module tests, so contributors can run the suite without private backend credentials.

Expected outcome:

  • passed: unit and integration tests executed locally
  • deselected: cross_module tests intentionally excluded by marker filter

2) Coverage gate run

python -m pytest agent_genesis/tests -q \
  --cov=agent_genesis \
  --cov-config=../.coveragerc \
  --cov-report=term-missing:skip-covered

The coverage threshold is enforced by .coveragerc (fail_under = 90).

3) Cross-module backend run (optional)

python -m pytest agent_genesis/tests -q \
  -m cross_module \
  -o addopts=\"-ra --strict-markers\"

These tests require a live backend and environment variables such as:

  • BACKEND_URL
  • INTERNAL_API_KEY
  • AGENT_GENESIS_API_KEY
  • CROSS_TEST_SLUG
  • CROSS_TEST_SUBMIT_ID
  • CROSS_TEST_SUBMIT_ID_CLAIMED
  • CROSS_TEST_USER_ID
  • CROSS_TEST_KEY_ID

Expected outcome for this mode:

  • skipped: environment-dependent fixtures are missing and tests self-skip with explicit reasons
  • passed: backend and credentials are configured correctly

License

Apache License 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_genesis-0.0.52.tar.gz (131.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_genesis-0.0.52-py3-none-any.whl (106.4 kB view details)

Uploaded Python 3

File details

Details for the file agent_genesis-0.0.52.tar.gz.

File metadata

  • Download URL: agent_genesis-0.0.52.tar.gz
  • Upload date:
  • Size: 131.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for agent_genesis-0.0.52.tar.gz
Algorithm Hash digest
SHA256 9191a30f56546301c8b546dceffd3da01d9f8bde382eafd3dde73c563a08fa3f
MD5 fe8233ebcf3151d68d3deaac453b89ce
BLAKE2b-256 2c6e77cc3392c8fbe7b29effc6d17699863d00be1c2c1106886da0203ef1d42e

See more details on using hashes here.

File details

Details for the file agent_genesis-0.0.52-py3-none-any.whl.

File metadata

  • Download URL: agent_genesis-0.0.52-py3-none-any.whl
  • Upload date:
  • Size: 106.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for agent_genesis-0.0.52-py3-none-any.whl
Algorithm Hash digest
SHA256 df3c8b8551782cdeac051e4106028ec7073a3940303e2dc95d75c6c7a3c0a421
MD5 d81e4990af4c151864a7f2d431f7b26e
BLAKE2b-256 ebcfe74f123afda940ec54bec6266603bae40e4030f5ac6ffdabe5918c581242

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page