Agent Genesis evaluation SDK.
Project description
agent-genesis
An evaluation SDK for building, registering, and running agent-based coding challenges with dual-sandbox isolation.
Features
- Define problems with multi-phase evaluation pipelines
- Dual-sandbox architecture: isolated judge + user containers per test case
- gRPC-based communication between judge and user runtimes
- Template image pool with LRU garbage collection for fast container startup
- Concurrency control: global sandbox limits + per-submission parallelism caps
- Built-in problem registry, artifact management, and revision workflows
Install
pip install agent-genesis
For server-side deployment (Docker sandbox + gRPC transport):
pip install "agent-genesis[server]"
Platform
- public account for quick start:
Username: genesis
Password:12345678
- Or you can register your own account.
Problems
The platform includes diverse agent challenges testing different capabilities:
Multi-Agent Coordination
werewolf- Isolated multi-agent werewolf game with role-based strategymicroservice_avalanche- Distributed transaction coordination across order/inventory/payment services
Tool Use & Planning
maze- Navigate random mazes using LLM agent with tool callstool_creator_challenge- Dynamically create and use tools to solve queries
Parallel Execution
parallel_weather- Query 200 cities in <27s using parallel tool callsshort_circuit_scraper- Fast-fail pattern with 10 endpoints under time pressure
Resilience & Retry Logic
resilient_scraper- Exponential backoff retry strategy with probabilistic failures
Semantic Analysis
log_hunter- Find 3 hacker IPs in 800K tokens of access logs (high token consumption)interrupt_judge- Determine when to interrupt user utterances
Structured Output
structured_output- Process 1000 questions with strict schema compliance in 25s
Shopping Agent
sports_shopping- Multi-constraint shopping with 12 items, guardrails, and time limits
Each problem is in problems/<name>/ with config, sandbox environment, and registration scripts.
Show
WereWolf Game: Agent Genesis
Testing
Run commands from the evaluation/ directory.
1) Default OSS test run (recommended)
python -m pytest -q
Default pytest options exclude cross_module tests, so contributors can run the suite without private backend credentials.
Expected outcome:
passed: unit and integration tests executed locallydeselected:cross_moduletests intentionally excluded by marker filter
2) Coverage gate run
python -m pytest agent_genesis/tests -q \
--cov=agent_genesis \
--cov-config=../.coveragerc \
--cov-report=term-missing:skip-covered
The coverage threshold is enforced by .coveragerc (fail_under = 90).
3) Cross-module backend run (optional)
python -m pytest agent_genesis/tests -q \
-m cross_module \
-o addopts=\"-ra --strict-markers\"
These tests require a live backend and environment variables such as:
BACKEND_URLINTERNAL_API_KEYAGENT_GENESIS_API_KEYCROSS_TEST_SLUGCROSS_TEST_SUBMIT_IDCROSS_TEST_SUBMIT_ID_CLAIMEDCROSS_TEST_USER_IDCROSS_TEST_KEY_ID
Expected outcome for this mode:
skipped: environment-dependent fixtures are missing and tests self-skip with explicit reasonspassed: backend and credentials are configured correctly
License
Apache License 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agent_genesis-0.0.52.tar.gz.
File metadata
- Download URL: agent_genesis-0.0.52.tar.gz
- Upload date:
- Size: 131.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9191a30f56546301c8b546dceffd3da01d9f8bde382eafd3dde73c563a08fa3f
|
|
| MD5 |
fe8233ebcf3151d68d3deaac453b89ce
|
|
| BLAKE2b-256 |
2c6e77cc3392c8fbe7b29effc6d17699863d00be1c2c1106886da0203ef1d42e
|
File details
Details for the file agent_genesis-0.0.52-py3-none-any.whl.
File metadata
- Download URL: agent_genesis-0.0.52-py3-none-any.whl
- Upload date:
- Size: 106.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
df3c8b8551782cdeac051e4106028ec7073a3940303e2dc95d75c6c7a3c0a421
|
|
| MD5 |
d81e4990af4c151864a7f2d431f7b26e
|
|
| BLAKE2b-256 |
ebcfe74f123afda940ec54bec6266603bae40e4030f5ac6ffdabe5918c581242
|