tp — non-invasive CLI testing framework for AI workflows. Submits results to trapstreet.run.

These details have not been verified by PyPI

Project links

Project description

trap — CLI

Lives at trapstreet-mvp/cli/ — part of the trapstreet monorepo. The standalone repo AntiNoise-ai/trap is retained for history only; all active development happens here.

Install (from PyPI — recommended):

uv tool install trapstreet-cli
# also works via pipx / pip

From git (latest main, no PyPI release needed):

uv tool install "git+https://github.com/AntiNoise-ai/trapstreet-mvp.git#subdirectory=cli"

The command name is tp. Releases are git-tagged cli-vX.Y.Z; tagging triggers PyPI publish via GitHub Actions.

trap is a non-invasive CLI testing framework for AI prompts, agents, and workflows. It treats the program under test (the "solution") as a black box: it invokes the solution as a subprocess, captures stdout/stderr/files, then optionally pipes that output through a "judge" (per-case scorer) and a "grader" (overall aggregator) — also subprocesses, also language-agnostic.

The framework knows nothing about how the solution is implemented. Python, shell scripts, compiled binaries, agentic pipelines — anything that can be invoked from a shell works.

Core idea: solution and task are decoupled

Two roles, two repos (or two directories), connected only by a small IO contract:

Role	Owns	Configures
Solution author	`trap.yaml`, the solution code	how to invoke the solution, which inputs to feed it, which outputs it produces
Task author	`traptask.yaml`, `judge.py`, `grader.py`, `inputs/`, `expected/`	the test cases, scoring logic, expected outputs

The solution doesn't need to import trap or know it exists. It reads paths from two environment variables (INPUTS, OUTPUTS) and runs.

inputs/{case_id}/   ──[INPUTS env var]──▶  solution  ──[OUTPUTS env var]──▶  .trap/{task}/{ts}/{case_id}/
expected/{case_id}/                                                                 │
       │                                                                            │
       └──────────────────────── judge  ◀──────────────────────────────────────────┘
                                   │
                            {metrics: any JSON}
                                   │
                  [collect all cases, hand to grader]
                                   │
                                grader
                                   │
                            {passed, score, ...}

Install

Requires Python >=3.14 and uv.

git clone https://github.com/AntiNoise-ai/trap
cd trap
uv sync

The installed entry point is tp (not trap), declared in pyproject.toml:

uv run tp --help

There is no PyPI release yet, so use uv run tp … from a clone, or run it from a wheel you build locally (uv build).

Quick start — the echo example

The repo ships two complete worked examples under examples/. Walk through examples/echo/ to see the moving pieces.

1. Solution side — `examples/echo/solution/`

echo.py reads JSON from stdin and prints message to stdout (or errors with exit 1 if message is missing):

import json, sys
data = json.load(sys.stdin)
if "message" not in data:
    print("error: missing 'message' field", file=sys.stderr)
    sys.exit(1)
print(data["message"])

trap.yaml tells trap how to run it and where the task lives:

tasks:
  test:
    description: Echo solution — reads stdin JSON, writes it back to stdout
    cmd: uv run python echo.py
    traptask: ../task          # path to the task directory (relative to trap.yaml)
    inputs:
      stdin: input.json        # pipe inputs/{case_id}/input.json into stdin

2. Task side — `examples/echo/task/`

traptask.yaml lists the cases and points at the judge/grader:

dirs:
  inputs: inputs/         # optional; this is the default
  expected: expected/     # optional; this is the default

cases:
  - id: contains_basic
    description: stdout contains the substring (case-insensitive)
    tags: [smoke]
  - id: exit_code_failure
    description: exit code is 1 when message field is missing
  - id: skipped_example
    skip: true
    tags: [wip]

judge:
  cmd: .venv/bin/python judge.py     # optional — omit for output-only mode

grader:
  cmd: .venv/bin/python grader.py    # optional — omit to skip aggregation

Each case has a directory under inputs/{id}/ (and optionally expected/{id}/) holding whatever files that case needs.

judge.py reads the payload from TRAPTASK_PAYLOAD, evaluates one case, prints a JSON metric to stdout:

import json, os, re
from pathlib import Path

data = json.loads(os.environ["TRAPTASK_PAYLOAD"])
stdout = Path(data["outputs"]["case_stdout"]).read_text().strip()
exit_code = json.loads(Path(data["outputs"]["case_meta.json"]).read_text())["exit_code"]
expected = json.loads(Path(data["expected"]["expected.json"]).read_text())

# … compute results …
print(json.dumps({"score": score}))

grader.py receives the list of all case results and emits the overall verdict:

import json, os
results = json.loads(os.environ["TRAPTASK_PAYLOAD"])
passed = all(r["metrics"]["score"] == 1.0 for r in results)
print(json.dumps({"passed": passed, "score": avg_score}))

3. Run it

From examples/echo/solution/:

uv run tp run                    # run the first task in trap.yaml
uv run tp run test               # run the named task
uv run tp run -t smoke           # only run cases tagged `smoke`
uv run tp run --output json      # print machine-readable JSON instead of a rich table
uv run tp run --fail-fast        # stop on first case whose judge score < 1.0

Trap writes per-run artifacts under .trap/{task}/{timestamp}/ and updates a latest symlink alongside it.

CLI reference

tp run [TASK] [OPTIONS]      # execute a task
tp report [TASK] [RUN]       # re-print the report for a stored run (defaults to `latest`)
tp init                      # scaffold trap.yaml + traptask.yaml — NOT YET IMPLEMENTED

`tp run` options

Flag	Default	Purpose
`TASK` (positional)	first task in `trap.yaml`	which task to run
`--config / -c`	`trap.yaml`	path to the trap config
`--tag / -t`	(none)	filter cases by tag; repeatable
`--output / -o`	`rich`	report renderer: `rich` or `json`
`--fail-fast`	`false`	stop after the first case whose judge `score < 1.0`
`--workspace / -w`	`.trap`	where to write run artifacts

`tp report` options

Re-renders a previously stored run from disk. Same --config / --output / --workspace flags; the RUN argument is the timestamp directory name, or latest (default).

Exit codes

0 — every case exited 0, and (if a grader ran) metrics.passed is not False.
1 — at least one case had a non-zero exit code, or the grader returned {"passed": false}.

Configuration reference

`trap.yaml` (solution author)

tasks:
  test:                        # task name; arbitrary
    description: optional      # shown in the report
    cmd: uv run python solution.py
    traptask: ../task          # path to the task dir (contains traptask.yaml)
    inputs:                    # optional
      stdin: input.txt         # filename in inputs/{case_id}/ to pipe as stdin
      files:                   # optional: filenames to assert exist before running
        - config.json
    file_outputs:              # files the solution promises to write
      - result.json
    timeout: 30                # seconds; default 30
    inputs_envvar: INPUTS      # override the env var name if you want
    outputs_envvar: OUTPUTS

  run:                         # second task; same traptask, different cmd or inputs
    cmd: uv run python solution.py
    traptask: ../task
    inputs:
      stdin: input.txt
    file_outputs:
      - result.json

tasks: is a mapping; each key is a task name you can pass to tp run.
traptask is required for every task — it points at the directory containing traptask.yaml.
cmd is parsed via shlex.split, run with the trap.yaml's directory as cwd.

`traptask.yaml` (task author)

The entire file is optional. If traptask.yaml is absent, trap scans inputs/ and treats each subdirectory as a case in output-only mode (no judge, no grader, no expected). With it:

dirs:
  inputs: inputs/                # default
  expected: expected/            # default

cases:
  - id: contains_basic           # must match an inputs/<id>/ directory
    description: optional
    tags: [smoke]                # for `tp run -t smoke`
  - id: skipped_example
    skip: true                   # case is not executed

judge:                           # optional; omit for output-only mode
  cmd: .venv/bin/python judge.py
  payload_envvar: TRAPTASK_PAYLOAD   # default; override if you must

grader:                          # optional; omit to skip aggregation
  cmd: .venv/bin/python grader.py

judge.cmd and grader.cmd run with cwd set to the task directory.

The IO contract

Trap injects environment variables at three points. Values are always JSON strings (not file paths) so consumers can json.loads(os.environ[…]) directly.

Solution-side: `INPUTS` and `OUTPUTS`

Before running each case, trap injects:

// INPUTS — every file in inputs/{case_id}/
{
  "input.json":  "/abs/path/task/inputs/contains_basic/input.json",
  "config.json": "/abs/path/task/inputs/contains_basic/config.json"
}

// OUTPUTS — every filename declared in trap.yaml `file_outputs`
{
  "result.json": "/abs/path/.trap/test/2026-05-09T14:30:00/contains_basic/result.json"
}

Keys are full filenames with extension. Values are absolute paths. The solution reads INPUTS["foo.json"], writes to OUTPUTS["result.json"]. If you have nothing to read or write via files you can still receive content on stdin via inputs.stdin in trap.yaml.

stdout, stderr, and meta.json are captured automatically — the solution never writes to OUTPUTS["case_stdout"] itself.

Judge-side: `TRAPTASK_PAYLOAD`

For each case, the judge receives a JSON string with three namespaces:

{
  "inputs":   { "input.json":      "/abs/path/task/inputs/case1/input.json"      },
  "outputs":  {
    "case_stdout":     "/abs/path/.trap/test/.../case1/case_stdout",
    "case_stderr":     "/abs/path/.trap/test/.../case1/case_stderr",
    "case_meta.json":  "/abs/path/.trap/test/.../case1/case_meta.json"
  },
  "expected": { "expected.json":   "/abs/path/task/expected/case1/expected.json" }
}

Note the captured outputs are keyed case_stdout, case_stderr, case_meta.json (prefixed) — that's what the runner writes to disk and what the example judge.py reads. case_meta.json contains {"exit_code": N, "duration": seconds}.

The judge prints free-form JSON to stdout. Trap stores it verbatim as CaseResult.metrics. Convention: include a numeric score field if you want --fail-fast to be meaningful (it checks metrics.score < 1.0).

Grader-side: `TRAPTASK_PAYLOAD`

The grader receives the full list of per-case results as JSON:

[
  {"case_id": "contains_basic", "exit_code": 0, "duration": 0.12, "metrics": {"score": 1.0}, "skipped": false},
  {"case_id": "exact_match",    "exit_code": 0, "duration": 0.11, "metrics": {"score": 0.5}, "skipped": false}
]

It prints free-form JSON to stdout. Convention: include passed: bool (used for the exit-code check) and score: float (used in the report header).

The `.trap/` workspace

.trap/
└── {task_name}/
    ├── latest -> 2026-05-09T14:30:00/
    └── 2026-05-09T14:30:00/
        ├── {case_id}/
        │   ├── case_stdout
        │   ├── case_stderr
        │   ├── case_meta.json           # {"exit_code": 0, "duration": 0.12}
        │   ├── judge_stdout             # raw judge output (if judge ran)
        │   ├── judge_stderr
        │   ├── judge_meta.json
        │   └── {any declared file_outputs}
        ├── grader_stdout                # raw grader output (if grader ran)
        ├── grader_stderr
        ├── grader_meta.json
        └── report.json                  # the rendered/JSON report for this run

Use tp report to re-display a stored run without re-executing the solution.

Three running modes

Choose by what you put (or don't put) in traptask.yaml:

Mode	`judge`	`grader`	Pass/fail signal
Output-only	absent	absent	any case `exit_code != 0` → exit 1
Per-case scoring	present	absent	same as above
Full evaluation	present	present	grader's `metrics.passed == false` → exit 1

In output-only mode you can even omit traptask.yaml entirely — trap will discover cases by scanning inputs/ subdirectories.

Current limitations

tp init is a stub; scaffolding is not implemented yet.
Cases run sequentially (the TaskRunner._iter generator is deliberately left as a seam for future parallelization).
No PyPI release; install from source.
Python 3.14 minimum.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.0

May 28, 2026

0.3.0

May 27, 2026

0.2.0

May 15, 2026

This version

0.1.0

May 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trapstreet_cli-0.1.0.tar.gz (16.6 kB view details)

Uploaded May 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

trapstreet_cli-0.1.0-py3-none-any.whl (21.7 kB view details)

Uploaded May 14, 2026 Python 3

File details

Details for the file trapstreet_cli-0.1.0.tar.gz.

File metadata

Download URL: trapstreet_cli-0.1.0.tar.gz
Upload date: May 14, 2026
Size: 16.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for trapstreet_cli-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`f8d5f7695fcc226beac93d62c89067b6ad78a85014c36114b1a289cf5fde6424`
MD5	`fa12c968b7125b62de10bb0addb5a70e`
BLAKE2b-256	`4ac45596c6aa0bfc62957a1739ada93244ff7e2a1e187d2ffce7844d56ae0d63`

See more details on using hashes here.

File details

Details for the file trapstreet_cli-0.1.0-py3-none-any.whl.

File metadata

Download URL: trapstreet_cli-0.1.0-py3-none-any.whl
Upload date: May 14, 2026
Size: 21.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for trapstreet_cli-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9258aa1654e57996b4dfff985b9b4ef5c4ce1a2bd3b837ad34a8d4f77399d678`
MD5	`389621927f7ee56724c6a3946697b75c`
BLAKE2b-256	`c9b76a40f66db225ed907fe34d0525fdf948f1f0fbb238bc33f06b0c961141ed`

See more details on using hashes here.

trapstreet-cli 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

trap — CLI

Core idea: solution and task are decoupled

Install

Quick start — the echo example

1. Solution side — examples/echo/solution/

2. Task side — examples/echo/task/

3. Run it

CLI reference

tp run options

tp report options

Exit codes

Configuration reference

trap.yaml (solution author)

traptask.yaml (task author)

The IO contract

Solution-side: INPUTS and OUTPUTS

Judge-side: TRAPTASK_PAYLOAD

Grader-side: TRAPTASK_PAYLOAD

The .trap/ workspace

Three running modes

Current limitations

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

1. Solution side — `examples/echo/solution/`

2. Task side — `examples/echo/task/`

`tp run` options

`tp report` options

`trap.yaml` (solution author)

`traptask.yaml` (task author)

Solution-side: `INPUTS` and `OUTPUTS`

Judge-side: `TRAPTASK_PAYLOAD`

Grader-side: `TRAPTASK_PAYLOAD`

The `.trap/` workspace