Skip to main content

Agent Genesis evaluation SDK.

Project description

agent-genesis

An evaluation SDK for building, registering, and running agent-based coding challenges with dual-sandbox isolation.

Features

  • Define problems with multi-phase evaluation pipelines
  • Dual-sandbox architecture: isolated judge + user containers per test case
  • gRPC-based communication between judge and user runtimes
  • Template image pool with LRU garbage collection for fast container startup
  • Concurrency control: global sandbox limits + per-submission parallelism caps
  • Built-in problem registry, artifact management, and revision workflows

Install

pip install agent-genesis

For server-side deployment (Docker sandbox + gRPC transport):

pip install "agent-genesis[server]"

Platform

Agent Genesis

  • public account for quick start:

Username: genesis

Password:12345678

  • Or you can register your own account.

Problems

The platform includes diverse agent challenges testing different capabilities:

Multi-Agent Coordination

  • werewolf - Isolated multi-agent werewolf game with role-based strategy
  • microservice_avalanche - Distributed transaction coordination across order/inventory/payment services

Tool Use & Planning

  • maze - Navigate random mazes using LLM agent with tool calls
  • tool_creator_challenge - Dynamically create and use tools to solve queries

Parallel Execution

  • parallel_weather - Query 200 cities in <27s using parallel tool calls
  • short_circuit_scraper - Fast-fail pattern with 10 endpoints under time pressure

Resilience & Retry Logic

  • resilient_scraper - Exponential backoff retry strategy with probabilistic failures

Semantic Analysis

  • log_hunter - Find 3 hacker IPs in 800K tokens of access logs (high token consumption)
  • interrupt_judge - Determine when to interrupt user utterances

Structured Output

  • structured_output - Process 1000 questions with strict schema compliance in 25s

Shopping Agent

  • sports_shopping - Multi-constraint shopping with 12 items, guardrails, and time limits

Each problem is in problems/<name>/ with config, sandbox environment, and registration scripts.

Show

WereWolf Game: Agent Genesis

image-20260402125224520

Testing

Run commands from the evaluation/ directory.

1) Default OSS test run (recommended)

python -m pytest -q

Default pytest options exclude cross_module tests, so contributors can run the suite without private backend credentials.

Expected outcome:

  • passed: unit and integration tests executed locally
  • deselected: cross_module tests intentionally excluded by marker filter

2) Coverage gate run

python -m pytest agent_genesis/tests -q \
  --cov=agent_genesis \
  --cov-config=../.coveragerc \
  --cov-report=term-missing:skip-covered

The coverage threshold is enforced by .coveragerc (fail_under = 90).

3) Cross-module backend run (optional)

python -m pytest agent_genesis/tests -q \
  -m cross_module \
  -o addopts=\"-ra --strict-markers\"

These tests require a live backend and environment variables such as:

  • BACKEND_URL
  • INTERNAL_API_KEY
  • AGENT_GENESIS_API_KEY
  • CROSS_TEST_SLUG
  • CROSS_TEST_SUBMIT_ID
  • CROSS_TEST_SUBMIT_ID_CLAIMED
  • CROSS_TEST_USER_ID
  • CROSS_TEST_KEY_ID

Expected outcome for this mode:

  • skipped: environment-dependent fixtures are missing and tests self-skip with explicit reasons
  • passed: backend and credentials are configured correctly

License

Apache License 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_genesis-0.0.55.tar.gz (131.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_genesis-0.0.55-py3-none-any.whl (106.6 kB view details)

Uploaded Python 3

File details

Details for the file agent_genesis-0.0.55.tar.gz.

File metadata

  • Download URL: agent_genesis-0.0.55.tar.gz
  • Upload date:
  • Size: 131.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for agent_genesis-0.0.55.tar.gz
Algorithm Hash digest
SHA256 1e08a07165ee4620814678059ea5235511f620b67eec36a115f0b5d2d6f4697c
MD5 eeae074c394cf0d6ed6bef71e024726d
BLAKE2b-256 9ff56afa77eb756dffccee0c0b529cdbfdc18f2712a35cb3ff72f08d8234ed01

See more details on using hashes here.

File details

Details for the file agent_genesis-0.0.55-py3-none-any.whl.

File metadata

  • Download URL: agent_genesis-0.0.55-py3-none-any.whl
  • Upload date:
  • Size: 106.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for agent_genesis-0.0.55-py3-none-any.whl
Algorithm Hash digest
SHA256 d45f9c6548b7b7eb43f9ec2ebd8dead1748428e31f11d3c552f0af56b25f5759
MD5 203d85b8f13e85123cac4fe74476cbe6
BLAKE2b-256 0f0487ddc867ab87b34e6359484c1494286f503dfd97250dafc2d09749c8c552

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page