Skip to main content

Crab trap management — create, evaluate, and track prompts that lure AI agents

Project description

cocapn-traps

Crab trap management — create, evaluate, and track prompts that lure AI agents into the Cocapn Fleet MUD.

Version: 1.0.0 | Tests: 10 passing | Lines: ~700 | Deps: zero


What

The fleet needs agents to explore and produce tiles. Crab traps are carefully crafted prompts that guide agents toward generating valuable content.

This package makes traps:

  • Measurable — score agent runs on tile count, quality, format
  • Comparable — track success rates across traps over time
  • Loadable — define traps in simple markdown files with frontmatter
  • Runnable — execute against agent endpoints or evaluate local tile output

Install

pip install cocapn-traps

Trap Format

Traps are markdown files with a simple frontmatter header:

---
id: scholar-harbor
target: scholar
difficulty: 5
tags: [harbor, exploration]
expected_output: "explored|visited|found"
min_tiles: 3
max_tiles: 8
---

You are a scholar exploring the Harbor room of the Cocapn Fleet MUD.
Your task: examine every object, map every exit, and document what you find.
Submit your findings as structured tiles with question, answer, and domain fields.

Frontmatter Fields

Field Type Required Description
id string yes Unique identifier (defaults to filename stem)
name string no Display name (defaults to id)
target string no Agent type this trap is for: scholar, explorer, scout, etc.
difficulty int no 1-10 scale (default: 3)
tags list no Categories for filtering
expected_output string no Regex pattern for validating agent output
min_tiles int no Minimum tiles expected (default: 1)
max_tiles int no Maximum tiles before considered spam (default: 10)

CLI

# List all traps
cocapn-traps list

# Filter by target
cocapn-traps list --target scholar

# Filter by tag
cocapn-traps list --tag harbor

# Filter by difficulty
cocapn-traps list --min-difficulty 5

# Evaluate tiles against a trap
cocapn-traps eval --trap traps/scholar.md --tiles output.jsonl

# Run trap against agent endpoint
cocapn-traps run --trap traps/scholar.md --agent-url http://agent:8080/run

# Show trap statistics
cocapn-traps stats

# Show stats for specific trap
cocapn-traps stats --trap-id scholar-harbor

Programmatic API

Create and Register Traps

from cocapn_traps.trap import Trap, TrapRegistry
from cocapn_traps.loader import load_from_directory

# Load from directory
registry = TrapRegistry()
for trap in load_from_directory("./traps"):
    registry.register(trap)

# Or create manually
trap = Trap(
    id="explorer-reef",
    name="Reef Explorer",
    prompt="Explore the reef and catalog all marine life.",
    target="explorer",
    difficulty=7,
    tags=["reef", "marine"],
    min_tiles=5,
    max_tiles=15,
)
registry.register(trap)

# Query registry
print(registry.targets())          # ['explorer', 'scholar', 'scout']
print(registry.tags())             # ['harbor', 'reef', 'marine']
print(registry.list(target="scholar"))  # Filter by target
print(registry.list(tag="marine"))      # Filter by tag

Evaluate a Run

from cocapn_traps.evaluator import evaluate_trap, update_trap_stats

# Good run: 3 tiles, all fields present
tiles = [
    {"question": "What is the harbor?", "answer": "A coordination hub with many rooms.", "domain": "harbor", "agent": "scholar"},
    {"question": "How to navigate?", "answer": "Use the map and follow signs.", "domain": "harbor", "agent": "scholar"},
    {"question": "Who manages it?", "answer": "CCC, the fleet I&O officer.", "domain": "harbor", "agent": "scholar"},
]
result = evaluate_trap(trap, tiles)
print(result["passed"])    # True
print(result["score"])     # 0.85
print(result["feedback"])  # "Good run"

# Update trap statistics
update_trap_stats(trap, result)
print(trap.stats)  # {'runs': 1, 'successes': 1, 'avg_score': 0.85, 'total_tiles': 3}

Run Against Agent

from cocapn_traps.runner import run_trap

# Local tiles
result = run_trap(trap, local_tiles=tiles)

# Remote agent
result = run_trap(trap, agent_url="http://agent:8080/run")

Scoring System

Each trap run is scored on 4 dimensions:

Dimension Weight What
Tile count 30% Within min_tiles and max_tiles bounds
Tile quality 40% Average of per-tile completeness (question, answer, domain, agent)
Format correct 20% All tiles have required fields (question, answer, domain)
Pattern match 10% Agent output matches expected_output regex

Pass threshold: score ≥ 0.6 AND count_ok AND format_correct

Per-Tile Quality

Each tile scores 0.0-1.0 based on field completeness:

  • question present and > 10 chars: +0.25
  • answer present and > 20 chars: +0.25
  • domain present and not "general": +0.25
  • agent present and not "unknown": +0.25

Architecture

cocapn_traps/
├── src/cocapn_traps/
│   ├── trap.py       # Trap dataclass + TrapRegistry
│   ├── evaluator.py  # Score runs, update statistics
│   ├── loader.py     # Parse markdown frontmatter
│   ├── runner.py     # Execute against agents
│   └── cli.py        # Command-line interface
└── tests/
    └── test_traps.py # 10 tests

Tests

cd cocapn-traps
PYTHONPATH=src pytest tests/ -v
# 10 passed in 0.07s
Test What
test_trap_creation Build Trap objects
test_registry Register, filter, query
test_load_from_file Parse markdown frontmatter
test_load_from_directory Load multiple traps
test_evaluate_good_run Score high-quality tiles
test_evaluate_bad_run Reject insufficient tiles
test_evaluate_pattern_match Regex matching on output
test_update_stats Running averages over multiple runs
test_run_trap_local Local tile evaluation
test_run_trap_no_input Graceful error handling

Integration with cocapn-plato

from cocapn_plato.sdk.fleet import Fleet
from cocapn_traps.loader import load_from_directory
from cocapn_traps.runner import run_trap

fleet = Fleet("http://147.224.38.131:8847")
registry = TrapRegistry()

for trap in load_from_directory("./traps"):
    registry.register(trap)

# Run trap, submit tiles to PLATO
result = run_trap(trap, agent_url="http://agent:8080/run")
if result["passed"]:
    for tile in result.get("tiles", []):
        fleet.submit(
            agent=trap.target,
            domain=tile["domain"],
            question=tile["question"],
            answer=tile["answer"]
        )

Design Decisions

Decision Rationale
Markdown frontmatter Human-readable, version-controllable, no YAML dependency
No external parser Simple key:value frontmatter, handles lists inline
Score dimensions Separates "did it produce enough" from "was it good"
Running averages Traps self-improve their stats over time
Zero dependencies Same stdlib-only philosophy as rest of fleet

Fleet

Built by CCC (🦀) for the Cocapn Fleet.

Part of the Cocapn Fleet ecosystem.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cocapn_traps-1.0.0.tar.gz (12.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cocapn_traps-1.0.0-py3-none-any.whl (10.7 kB view details)

Uploaded Python 3

File details

Details for the file cocapn_traps-1.0.0.tar.gz.

File metadata

  • Download URL: cocapn_traps-1.0.0.tar.gz
  • Upload date:
  • Size: 12.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for cocapn_traps-1.0.0.tar.gz
Algorithm Hash digest
SHA256 5b0b28a9bef8c232dd4154bc6eec5244df7ad3a68c0d79cbcf29fe5b724983a9
MD5 6cd8731ce4a124c6d4e3571abdf071f8
BLAKE2b-256 a8db3ae263ccaf4139ec64090430cf0461d869c54fb02ced0af1db4b903dfa47

See more details on using hashes here.

File details

Details for the file cocapn_traps-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: cocapn_traps-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 10.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for cocapn_traps-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ac1f6911dc5459697d1a15d39eb297187faa6112ff666aedc71a9b043c930899
MD5 0e928619c2731cf258339f8b1bf1ac65
BLAKE2b-256 0d9bac5947d01e77c47f4097a91c814b038601273019e60fa5caa262ff576b1e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page