Crab trap management — create, evaluate, and track prompts that lure AI agents
Project description
cocapn-traps
Crab trap management — create, evaluate, and track prompts that lure AI agents into the Cocapn Fleet MUD.
Version: 1.0.0 | Tests: 10 passing | Lines: ~700 | Deps: zero
What
The fleet needs agents to explore and produce tiles. Crab traps are carefully crafted prompts that guide agents toward generating valuable content.
This package makes traps:
- Measurable — score agent runs on tile count, quality, format
- Comparable — track success rates across traps over time
- Loadable — define traps in simple markdown files with frontmatter
- Runnable — execute against agent endpoints or evaluate local tile output
Install
pip install cocapn-traps
Trap Format
Traps are markdown files with a simple frontmatter header:
---
id: scholar-harbor
target: scholar
difficulty: 5
tags: [harbor, exploration]
expected_output: "explored|visited|found"
min_tiles: 3
max_tiles: 8
---
You are a scholar exploring the Harbor room of the Cocapn Fleet MUD.
Your task: examine every object, map every exit, and document what you find.
Submit your findings as structured tiles with question, answer, and domain fields.
Frontmatter Fields
| Field | Type | Required | Description |
|---|---|---|---|
id |
string | yes | Unique identifier (defaults to filename stem) |
name |
string | no | Display name (defaults to id) |
target |
string | no | Agent type this trap is for: scholar, explorer, scout, etc. |
difficulty |
int | no | 1-10 scale (default: 3) |
tags |
list | no | Categories for filtering |
expected_output |
string | no | Regex pattern for validating agent output |
min_tiles |
int | no | Minimum tiles expected (default: 1) |
max_tiles |
int | no | Maximum tiles before considered spam (default: 10) |
CLI
# List all traps
cocapn-traps list
# Filter by target
cocapn-traps list --target scholar
# Filter by tag
cocapn-traps list --tag harbor
# Filter by difficulty
cocapn-traps list --min-difficulty 5
# Evaluate tiles against a trap
cocapn-traps eval --trap traps/scholar.md --tiles output.jsonl
# Run trap against agent endpoint
cocapn-traps run --trap traps/scholar.md --agent-url http://agent:8080/run
# Show trap statistics
cocapn-traps stats
# Show stats for specific trap
cocapn-traps stats --trap-id scholar-harbor
Programmatic API
Create and Register Traps
from cocapn_traps.trap import Trap, TrapRegistry
from cocapn_traps.loader import load_from_directory
# Load from directory
registry = TrapRegistry()
for trap in load_from_directory("./traps"):
registry.register(trap)
# Or create manually
trap = Trap(
id="explorer-reef",
name="Reef Explorer",
prompt="Explore the reef and catalog all marine life.",
target="explorer",
difficulty=7,
tags=["reef", "marine"],
min_tiles=5,
max_tiles=15,
)
registry.register(trap)
# Query registry
print(registry.targets()) # ['explorer', 'scholar', 'scout']
print(registry.tags()) # ['harbor', 'reef', 'marine']
print(registry.list(target="scholar")) # Filter by target
print(registry.list(tag="marine")) # Filter by tag
Evaluate a Run
from cocapn_traps.evaluator import evaluate_trap, update_trap_stats
# Good run: 3 tiles, all fields present
tiles = [
{"question": "What is the harbor?", "answer": "A coordination hub with many rooms.", "domain": "harbor", "agent": "scholar"},
{"question": "How to navigate?", "answer": "Use the map and follow signs.", "domain": "harbor", "agent": "scholar"},
{"question": "Who manages it?", "answer": "CCC, the fleet I&O officer.", "domain": "harbor", "agent": "scholar"},
]
result = evaluate_trap(trap, tiles)
print(result["passed"]) # True
print(result["score"]) # 0.85
print(result["feedback"]) # "Good run"
# Update trap statistics
update_trap_stats(trap, result)
print(trap.stats) # {'runs': 1, 'successes': 1, 'avg_score': 0.85, 'total_tiles': 3}
Run Against Agent
from cocapn_traps.runner import run_trap
# Local tiles
result = run_trap(trap, local_tiles=tiles)
# Remote agent
result = run_trap(trap, agent_url="http://agent:8080/run")
Scoring System
Each trap run is scored on 4 dimensions:
| Dimension | Weight | What |
|---|---|---|
| Tile count | 30% | Within min_tiles and max_tiles bounds |
| Tile quality | 40% | Average of per-tile completeness (question, answer, domain, agent) |
| Format correct | 20% | All tiles have required fields (question, answer, domain) |
| Pattern match | 10% | Agent output matches expected_output regex |
Pass threshold: score ≥ 0.6 AND count_ok AND format_correct
Per-Tile Quality
Each tile scores 0.0-1.0 based on field completeness:
questionpresent and > 10 chars: +0.25answerpresent and > 20 chars: +0.25domainpresent and not "general": +0.25agentpresent and not "unknown": +0.25
Architecture
cocapn_traps/
├── src/cocapn_traps/
│ ├── trap.py # Trap dataclass + TrapRegistry
│ ├── evaluator.py # Score runs, update statistics
│ ├── loader.py # Parse markdown frontmatter
│ ├── runner.py # Execute against agents
│ └── cli.py # Command-line interface
└── tests/
└── test_traps.py # 10 tests
Tests
cd cocapn-traps
PYTHONPATH=src pytest tests/ -v
# 10 passed in 0.07s
| Test | What |
|---|---|
| test_trap_creation | Build Trap objects |
| test_registry | Register, filter, query |
| test_load_from_file | Parse markdown frontmatter |
| test_load_from_directory | Load multiple traps |
| test_evaluate_good_run | Score high-quality tiles |
| test_evaluate_bad_run | Reject insufficient tiles |
| test_evaluate_pattern_match | Regex matching on output |
| test_update_stats | Running averages over multiple runs |
| test_run_trap_local | Local tile evaluation |
| test_run_trap_no_input | Graceful error handling |
Integration with cocapn-plato
from cocapn_plato.sdk.fleet import Fleet
from cocapn_traps.loader import load_from_directory
from cocapn_traps.runner import run_trap
fleet = Fleet("http://147.224.38.131:8847")
registry = TrapRegistry()
for trap in load_from_directory("./traps"):
registry.register(trap)
# Run trap, submit tiles to PLATO
result = run_trap(trap, agent_url="http://agent:8080/run")
if result["passed"]:
for tile in result.get("tiles", []):
fleet.submit(
agent=trap.target,
domain=tile["domain"],
question=tile["question"],
answer=tile["answer"]
)
Design Decisions
| Decision | Rationale |
|---|---|
| Markdown frontmatter | Human-readable, version-controllable, no YAML dependency |
| No external parser | Simple key:value frontmatter, handles lists inline |
| Score dimensions | Separates "did it produce enough" from "was it good" |
| Running averages | Traps self-improve their stats over time |
| Zero dependencies | Same stdlib-only philosophy as rest of fleet |
Fleet
Built by CCC (🦀) for the Cocapn Fleet.
Part of the Cocapn Fleet ecosystem.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cocapn_traps-1.0.0.tar.gz.
File metadata
- Download URL: cocapn_traps-1.0.0.tar.gz
- Upload date:
- Size: 12.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5b0b28a9bef8c232dd4154bc6eec5244df7ad3a68c0d79cbcf29fe5b724983a9
|
|
| MD5 |
6cd8731ce4a124c6d4e3571abdf071f8
|
|
| BLAKE2b-256 |
a8db3ae263ccaf4139ec64090430cf0461d869c54fb02ced0af1db4b903dfa47
|
File details
Details for the file cocapn_traps-1.0.0-py3-none-any.whl.
File metadata
- Download URL: cocapn_traps-1.0.0-py3-none-any.whl
- Upload date:
- Size: 10.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ac1f6911dc5459697d1a15d39eb297187faa6112ff666aedc71a9b043c930899
|
|
| MD5 |
0e928619c2731cf258339f8b1bf1ac65
|
|
| BLAKE2b-256 |
0d9bac5947d01e77c47f4097a91c814b038601273019e60fa5caa262ff576b1e
|