Research platform for studying AI behavior through game scenarios
Project description
AgentDeck ๐ฎ
A research platform for studying AI behavior through game scenarios
Status: v0.1.0 (Pre-release) - Core functionality complete, polish in progress Test Coverage: 300+ tests, CI gate at 75% coverage Note: This is a work-in-progress repository. The first public release in the fresh repository will be tagged v0.1.0.
GPT and Gem AI assistants for exploration, development, contribution, and research:
๐ฏ Purpose & Vision
AgentDeck is a research platform for studying AI behavior through game scenarios. It enables researchers to run controlled experiments where AI agents interact in well-defined environments, providing comprehensive data collection for analysis of prompting strategies, decision-making patterns, and model capabilities.
Why Games?
Most LLM benchmarks measure knowledge (answering static questions). But real-world utility requires agency: maintaining state, forming strategies, and adapting over time.
Games are the perfect "behavioral wind tunnel" for testing these capabilities:
- Constrained environments โ Isolate specific variables (e.g., "Does the model understand resource scarcity?")
- Iterative decision making โ Agents live with consequences, testing long-term planning
- Social dynamics โ Multiplayer games reveal cooperation, betrayal, and negotiation patterns
- Measurable outcomes โ Win/lose provides clear signal for cost/quality trade-offs
The Console Metaphor
AgentDeck is architected like a video game console to keep experiments modular and clean:
- ๐ฎ Console (AgentDeck) โ The engine that orchestrates sessions, manages seeding, and enforces rules
- ๐พ Game (Cartridge) โ Pure logic defining rules and state transitions; swap games without changing agents
- ๐ค Player โ The AI agent (GPT-4, Claude, Gemini) that "holds the controller"
- ๐น๏ธ Controller โ Translates the AI's text response into valid game actions
- ๐บ Renderer โ "Draws" the game state into text the AI can understand
- ๐๏ธ Spectator โ The audience watching the live stream (stats, narration, cost tracking)
- ๐น Recorder โ The "DVR" capturing every event for perfect replay and analysis
By separating these concerns, AgentDeck ensures your research is reproducible, observable, and easy to modify.
Core Capabilities:
- Run experiments with GPT-5, Claude, Gemini in ~10 lines of code
- Parallel execution - 10ร speedup with worker-based concurrency
- Complete observability - every decision, timing, and reasoning captured
- Real-time monitoring - live progress tracking with ETA and cost estimates
- Perfect replay - reconstruct exact match conditions from recordings
- Reproducible research - deterministic experiments via seeded randomness
โ๏ธ Architecture
AgentDeck follows a gaming console metaphor with clean separation of concerns:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ AgentDeck (Facade) โ โ You interact here
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Console (Orchestrator) โ โ Manages lifecycle
โโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโค
โ Game โ EventBus โ โ Game logic + Events
โโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโค
โ Players โ Spectators โ โ AI agents + Observers
โโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโ
Single Turn Flow
Core Components
Games define rules and state
- Implement 4 methods:
setup(),get_view(),update(),status() - State is JSON-serializable dicts (no complex objects)
- Example: FixedDamageGame
Players are AI agents making decisions
- Three-phase lifecycle: Handshake โ Turn โ Conclusion
- Built-in:
GPTPlayer,ClaudePlayer,GeminiPlayer,MockPlayer - Composable prompt templates via
PromptBuilder
Controllers parse AI responses into actions
ActionOnlyController- extracts single action tokenReasoningController- extracts reasoning + actionAcceptOKHandshakeController- validates handshake acceptance
Renderers format game state for AI consumption
TextRenderer- human-readable text format- Custom renderers can provide JSON, images, etc.
Spectators observe and analyze matches
MatchNarrator- turn-by-turn commentaryProgressDisplay- real-time progress with ETATokenUsageTracker- cost tracking per player/modelStatsTracker- win rates and performance metrics
Recording & Replay
Recorder- captures complete match data to JSONReplayEngine- reconstructs matches with event parity guarantee
๐ Quick Start
Requires Python 3.9+ (CI covers 3.9โ3.11).
Installation
Source install (recommended for v0.1.0):
# Clone repository
git clone https://github.com/DiegoZoracKy/agentdeck-preview.git
cd agentdeck-preview
# Install with dependencies
pip install -e .
# Or install with dev tools
pip install -e ".[dev]"
# Optional provider extras
pip install -e ".[openai]" # OpenAI SDK
pip install -e ".[anthropic]" # Anthropic SDK
pip install -e ".[google]" # Google Vertex SDK
pip install -e ".[providers]" # All provider SDKs
# Research stack (optional statistics/plotting utilities)
pip install -e ".[research]"
# Minimal replay-only install (no providers)
pip install -e .
๐ฆ PyPI available:
pip install agentdeck-ai
Your First Experiment
from agentdeck import AgentDeck, GPTPlayer, FixedDamageGame, ActionOnlyController
# 1. Create a game
game = FixedDamageGame(
max_health=100,
attack_damage=20,
potion_heal=30,
starting_potions=1,
)
# 2. Create AI players
players = [
GPTPlayer(
name="Player-1",
model="gpt-4o-mini",
temperature=0.7,
controller=ActionOnlyController(),
),
GPTPlayer(
name="Player-2",
model="gpt-4o-mini",
temperature=0.7,
controller=ActionOnlyController(),
),
]
# Models must be provided explicitly for every provider-backed player.
# 3. Run experiment
with AgentDeck(game=game) as deck:
results = deck.play(
players=players,
matches=10,
seed=42, # Reproducible!
)
# 4. Analyze results
print(f"Win rates: {results.win_rates}")
๐ Models are explicit
Provider-backed players never fall back to defaults; passmodel=for every GPT/Claude/Gemini player.โน๏ธ Provider credentials
Set the provider-specific environment variables before running examples (OPENAI_API_KEY,ANTHROPIC_API_KEY, andVERTEX_PROJECT_ID/VERTEX_LOCATIONfor Gemini).
Try AgentDeck Without API Keys
- Run
python examples/mock_demo.py - Uses
MockPlayer(deterministic) so no LLM providers are needed - Shows live commentary + progress + stats, and saves recordings under
agentdeck_runs/mock_demo/<session>/records/
Walkthroughs & Docs
- Build your first game + replay tour:
docs/first_game_walkthrough.md(runsexamples/first_game_walkthrough.py) - Documentation index:
docs/README.md
What Youโll See (Artifacts & Output)
- Live progress (ProgressDisplay):
[Batch test] Match 2/3 | ETA: 5.1s | Rate: 0.6 matches/sec [Batch test] Match 3/3 | ETA: 0.0s | Rate: 0.7 matches/sec - Narration and results (MatchNarrator/Stats):
Turn 1: Alice โ ATTACK (Bob: 45 HP) | Bob โ POTION (65 HP) Turn 2: Alice โ ATTACK (Bob: 50 HP) | Bob โ ATTACK (Alice: 45 HP) Winner: Alice in 4 turns - Recording snippet (
agentdeck_runs/.../records/match_001.json):{ "match_id": "match_001", "seed": 7, "events": [ {"type": "player_handshake_start", "player": "Alice"}, {"type": "gameplay", "turn_number": 1, "prompt_text": "..."}, {"type": "match_end", "winner": "Alice", "turns": 4} ] }
- Cost/usage summary (TokenUsageTracker):
Total API Calls: 6 Total Tokens: 2,180 (prompt 1,420 | completion 760) Total Cost: $0.0421
Output:
Configuration:
Default Game: FixedDamageGame
Seed: 42
Player Details:
Player-1:
Model: gpt-4o-mini
Controller: ActionOnlyController
Player-2:
Model: gpt-4o-mini
Controller: ActionOnlyController
โ Player-1 handshake: OK
โ Player-2 handshake: OK
Match 1/10:
Turn 1: Player-1 โ ATTACK
Turn 2: Player-2 โ ATTACK
...
Winner: Player-1 (11 turns)
Win rates: {'Player-1': 0.6, 'Player-2': 0.4}
Parallel Execution (10ร Speedup)
from agentdeck import AgentDeck, AgentDeckConfig
from agentdeck.core.types import LogLevel
# Configure parallel execution with real-time monitoring
config = AgentDeckConfig(
seed=42,
concurrency=10, # Run 10 matches in parallel
log_level=LogLevel.INFO
)
# Run 100 matches with automatic progress tracking
with AgentDeck(game=game, session=config) as deck:
results = deck.play(players=players, matches=100)
# ProgressMonitor auto-attached - shows real-time ETA and cost tracking
Output:
[ProgressMonitor] Batch Progress: 10/100 (10.0%) | ETA: 2m 15s | Rate: 4.4 matches/sec
[ProgressMonitor] Batch Progress: 50/100 (50.0%) | ETA: 1m 08s | Rate: 4.6 matches/sec
[ProgressMonitor] Batch Progress: 100/100 (100.0%) | Completed in 2m 52s
Validated Performance: 10.26ร speedup with concurrency=10, deterministic replay parity guaranteed.
๐ก Key Features
1. Event-Driven Observation
Everything is observable through events - no modifications needed to games:
from agentdeck import AgentDeck
from agentdeck.spectators import MatchNarrator, TokenUsageTracker
# Add spectators for observation
with AgentDeck(game=game, spectators=[
MatchNarrator(), # Turn-by-turn commentary
TokenUsageTracker() # Cost tracking
]) as deck:
results = deck.play(players, matches=10)
2. Complete Recording & Replay
Every match is automatically recorded with full metadata:
import json
from pathlib import Path
from agentdeck import AgentDeck, Recorder
from agentdeck.core.replay import ReplayEngine
from agentdeck.spectators import MatchNarrator
# Record matches to JSON
recorder = Recorder(output_dir="agentdeck_records")
with AgentDeck(game=game, spectators=[recorder]) as deck:
deck.play(players, matches=3, seed=7)
# Load the most recent recording
recording_path = sorted(Path("agentdeck_records").glob("session_*/match_*.json"))[-1]
with recording_path.open("r", encoding="utf-8") as handle:
match_data = json.load(handle)
# Replay with new spectators (exact parity)
engine = ReplayEngine(match_data)
engine.replay(spectators=[MatchNarrator()], speed=0.0)
Replay Parity Guarantee: Replay emits identical event stream as live execution, including complete three-phase lifecycle (handshake โ gameplay โ conclusion).
3. Reproducible Experiments
Deterministic seeding ensures exact reproducibility:
# Same seed โ same results
with AgentDeck(game=game, seed=42) as deck:
results1 = deck.play(players, matches=100)
results2 = deck.play(players, matches=100)
assert results1.win_rates == results2.win_rates
4. Three-Phase Player Lifecycle
Players go through structured interaction phases:
- Handshake (Mandatory): Player acknowledges rules and format
- Turn (Gameplay): Player makes decisions each turn
- Conclusion (Optional): Player reflects on match outcome
This provides rich data for analyzing AI behavior patterns.
๐ What's Actually Implemented
AgentDeck v0.1.0 is the result of a spec-driven rewrite focusing on correctness, observability, and performance. Here's what's ready:
โ Complete & Tested
- Core Execution: Console, EventBus, three-phase lifecycle
- Parallel Execution: Worker-based concurrency with deterministic replay parity (10ร speedup validated)
- Monitor System: Real-time progress tracking with ProgressMonitor (auto-attached for parallel runs)
- LLM Integration: GPTPlayer, ClaudePlayer, GeminiPlayer (full lifecycle support with clone())
- Controllers: ActionOnlyController, ReasoningController (parser bug fixed), AcceptOKHandshakeController
- Renderers: TextRenderer (generic, works with any game)
- Games: FixedDamageGame example with information levels
- Spectators: MatchNarrator, ProgressDisplay, TokenUsageTracker, StatsTracker
- Recording: Recorder with complete metadata capture (parallel-compatible)
- Replay: ReplayEngine with full lifecycle parity (R1 guarantee)
- Prompt Composition: PromptBuilder with template system
- Reproducibility: Deterministic seeding and exact replay (validated in production)
- Test Suite: Hundreds of tests with CI coverage gate at 75%
๐ง Coming Soon (See ROADMAP.md)
- Research Module: Statistical comparison tools (Phase 2)
- Advanced Examples: Auction game, Prisoner's Dilemma
- Extension Templates: AI-assisted game/player/spectator creation (Phase 3)
- Documentation: Game authoring guide, spectator guide (Phase 3)
๐ฌ Current Milestone
v0.1.0 (Pre-release): Core Functionality Complete
- โ Worker-based parallel execution with deterministic replay parity (SPEC-PARALLEL v1.0.0)
- โ Monitor system for real-time progress tracking (SPEC-MONITOR v1.0.0)
- โ Production validation: 4 experiments, 40ร faster than estimated
- โ CI suite passing with coverage gate at 75%
- โ Validated with OpenAI GPT-4o-mini and GPT-4o
Next: Pre-release polish (packaging, docs, validation) โ Public v0.1.0 in fresh repository
๐ ๏ธ Development
Running Tests
# Install dependencies
pip install -e ".[dev]"
# Run test suite
pytest
# Run with coverage
pytest --cov=src/agentdeck --cov-report=html
Running Examples
# Set your API key
export OPENAI_API_KEY="sk-..."
# Run minimal experiment (GPT-4o-mini, 1 match)
python examples/test_prompt_builder_ux_minimal.py
# Run replay example
python examples/replay_minimal.py
# See all examples
ls examples/*.py
Project Structure
agentdeck/
โโโ src/agentdeck/
โ โโโ core/ # Console, EventBus, Recorder, Replay
โ โโโ players/ # GPT, Claude, Gemini, Mock
โ โโโ controllers/ # ActionOnly, Reasoning, Handshake
โ โโโ renderers/ # Text renderer
โ โโโ spectators/ # Narrator, Progress, TokenUsage, Stats
โ โโโ games/examples/ # FixedDamageGame
โโโ tests/ # Unit + integration suites (CI-gated coverage)
โโโ examples/ # Working examples
โโโ specs/ # Component specifications
๐ Documentation
- Architecture Spec - High-level system design and component navigation
- Component Specs - Detailed specifications for each component
- ROADMAP.md - Implementation progress and future plans
- Examples - Working code examples
Component Specifications
All components follow rigorous specifications with numbered invariants:
- SPEC-GAME.md - Game author contract
- SPEC-PLAYER.md - Three-phase player lifecycle
- SPEC-CONTROLLER.md - Response parsing
- SPEC-RENDERER.md - State formatting
- SPEC-SPECTATOR.md - Observation interface
- SPEC-RECORDER.md - Match persistence
- SPEC-REPLAY.md - Exact replay with parity guarantee
- SPEC-CONSOLE.md - Execution engine
- SPEC-OBSERVABILITY.md - Event system
๐ฏ Design Principles
- Spec-Driven: Every component has a rigorous specification
- Observable: Every decision is captured and analyzable
- Reproducible: Deterministic with seeded randomness
- Composable: Mix and match components freely
- Research-First: Built by researchers, for researchers
๐ License
MIT License - Free for research and commercial use.
Built with โค๏ธ for AI researchers
AgentDeck v0.1.0 - Spec-Driven Architecture for AI Behavioral Research
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentdeck_ai-0.1.0rc2.tar.gz.
File metadata
- Download URL: agentdeck_ai-0.1.0rc2.tar.gz
- Upload date:
- Size: 168.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8b4f7bfa7be0e1c2d7294e58129d6c0596ecf41f3b95d33c84fd2101b1f6f68c
|
|
| MD5 |
9ce28f126ca32ccf4afebc516ce37fa4
|
|
| BLAKE2b-256 |
df0960c3485f3dc7dfc68e0e429063b5e0ef7ee2db88b2205451befddbf7991e
|
File details
Details for the file agentdeck_ai-0.1.0rc2-py3-none-any.whl.
File metadata
- Download URL: agentdeck_ai-0.1.0rc2-py3-none-any.whl
- Upload date:
- Size: 191.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7c039453623bcdb1ac7e85f79784eead2646bef5cdbfaa0f15e998ca0a3c4350
|
|
| MD5 |
7aeb68520e1c79487e36336a1a7f698e
|
|
| BLAKE2b-256 |
7f33534ab5e2e19198cf209b9397ca61c3cda47cde6fa42c7bf2f1b51f7696a5
|