Eval-driven test/fix/improve harness for orchestrator-based apps

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

Tinkerloop

Tinkerloop to the rescue. If your orchestrator-to-MCP app communication is hard to trust, Tinkerloop gives you a scenario-based loop to reproduce the failure, diagnose it with deterministic checks, patch the target, and rerun until the behavior matches what you expect.

It is an eval-driven harness for testing and improving orchestrator-based apps through repeatable test -> diagnose -> patch -> rerun loops.

Release Status

Tinkerloop is in alpha.

The package, CLI, adapters, and report artifacts are usable now.
The supported v0.x surface is documented in docs/STABILITY.md.
The project is intended for technically strong early adopters who can own a target adapter.
It is not yet positioned as a benchmark suite or production-assurance layer.

What It Is

Tinkerloop is not another app-specific bot framework. It is a reusable outer loop for systems that already have:

an inner orchestrator model
tool or MCP integrations
a conversational or API-facing entrypoint

Tinkerloop plays the role of:

user simulator
integration tester
trajectory recorder
deterministic judge
developer feedback loop driver

Actor Model

There are two distinct roles in a Tinkerloop workflow:

inner target orchestrator: the model and tool path inside the app under test
outer coding model: the developer tool model using Tinkerloop artifacts to patch and rerun

The outer coding model may analyze results and edit code between runs. It must not replace the inner target orchestrator during a measured run. See docs/ACTOR_MODEL.md.

Who It Is For

teams that already have a target app and want deterministic scenario-based regression loops
teams that can keep target-specific logic in a target-owned adapter and scenario library
teams that want report-driven reruns rather than broad benchmark claims

Who It Is Not For

users looking for a zero-config app framework
teams that need remote secure-driver support today
users who want Tinkerloop to measure general model quality

MVP Scope

Current MVP:

load multi-turn scenario files
run them against a target app adapter
preflight the target app before scenario execution
resolve the target app's inner runtime from the target repo boundary
trace tool calls by patching configured execution points
trace tool calls from target-owned runner commands
evaluate deterministic checks
write JSON reports for failures and regressions
rerun only failed scenarios from report artifacts
separate repair-loop and confirmation-loop runs

Not in scope yet:

automatic patch generation
automatic deploys
autonomous code changes without a human gate
benchmark claims beyond the configured scenario set
secure non-prod target-driver contracts

Quick Start

Tinkerloop supports Python 3.10+. This repo pins 3.12.9 in .python-version for local development with pyenv.

The PyPI distribution name is tinkerloop-ai. Install it with:

python3 -m pip install tinkerloop-ai

If you need to install directly from a GitHub release asset instead:

python3 -m pip install https://github.com/bostoneco/tinkerloop/releases/download/<tag>/tinkerloop_ai-<version>-py3-none-any.whl

Then run it against a target-owned adapter and scenario directory:

tinkerloop \
  run \
  --adapter /path/to/target_adapter.py:create_adapter \
  --user-id <user-id> \
  --scenarios /path/to/scenarios

tinkerloop run exits with code 3 when the repair loop passes. That is intentional: run tinkerloop confirm ... before treating the result as final.

When a candidate fix looks good, run the external confirmation loop:

tinkerloop \
  confirm \
  --adapter /path/to/target_adapter.py:create_adapter \
  --user-id <user-id> \
  --scenarios /path/to/scenarios \
  --non-interactive

If your target repo exposes a more realistic runner or adapter for real-agent validation, use that boundary for confirm instead of the faster repair-loop boundary.

For local development from a source checkout:

pyenv local 3.12.9
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
pytest -q
tinkerloop \
  run \
  --adapter examples/starter_target/adapter.py:create_adapter \
  --user-id demo-user \
  --scenarios examples/starter_target/scenarios

# fuller demo target
tinkerloop \
  run \
  --adapter examples/demo_app/adapter.py:create_adapter \
  --user-id demo-user \
  --scenarios examples/demo_app/scenarios

For real projects, the target repo should own its adapter and scenarios. --adapter accepts either an import path such as your_project.tinkerloop_adapter:create_adapter or a file path such as /path/to/target_adapter.py:create_adapter.

For PythonAppAdapter, each patch_targets entry should point at a callable with the standard tool-call shape (tool_name, user_id, arguments, correlation_id=None). Scenario files must contain at least one turn, and each turn must define a non-empty user prompt.

If the adapter cannot resolve one inner model confidently, Tinkerloop will prompt for a repo-derived candidate in interactive mode. In non-interactive mode, pass explicit overrides:

tinkerloop \
  run \
  --adapter examples/demo_app/adapter.py:create_adapter \
  --user-id demo-user \
  --scenarios examples/demo_app/scenarios \
  --inner-provider <provider> \
  --inner-model <model>

Rerun only failed scenarios from report artifacts:

tinkerloop \
  run \
  --adapter examples/demo_app/adapter.py:create_adapter \
  --user-id demo-user \
  --scenarios examples/demo_app/scenarios \
  --failed-from artifacts/reports

Run only a tagged feature slice:

tinkerloop \
  run \
  --adapter examples/demo_app/adapter.py:create_adapter \
  --user-id demo-user \
  --scenarios examples/demo_app/scenarios \
  --tag cleanup \
  --tag preview

Artifacts written on each run:

timestamped report: tinkerloop-<timestamp>.json
stable latest report: latest.json
stable failure summary: latest-failures.json
stable diagnosis payload: latest-diagnosis.json includes confirmation_status for repair-loop vs confirmation-loop visibility
confirmation timestamped report: confirm-tinkerloop-<timestamp>.json
confirmation latest report: confirm-latest.json
confirmation failure summary: confirm-latest-failures.json
confirmation diagnosis payload: confirm-latest-diagnosis.json

When a repair run passes, Tinkerloop exits with code 3 and tells you to run tinkerloop confirm .... Repair-only results do not prove agent quality. If confirmation is blocked, Tinkerloop still writes confirm-latest-diagnosis.json with confirmation_status: "blocked" and the preflight error so the attempt is visible in artifacts.

Docs Map

docs/STABILITY.md: supported v0.x surface and experimental boundaries
docs/ACTOR_MODEL.md: inner target orchestrator vs outer coding model roles
docs/QUICKSTART_TARGET_REPO.md: minimal target-owned integration path
docs/ADAPTER_GUIDE.md: when to use PythonAppAdapter vs CommandAppAdapter
docs/TRUST_MODEL.md: what a pass/fail result does and does not mean
docs/TROUBLESHOOTING.md: first-run failure modes
docs/WORKED_EXAMPLE.md: failure -> diagnosis -> rerun example
docs/WORKING_AGREEMENT.md: day-to-day run discipline
docs/TARGET_CONTRACT.md: public integration boundary

Support Matrix

Python: 3.10+
Commands: run, confirm
Adapter shapes: PythonAppAdapter, CommandAppAdapter
Report schemas: tinkerloop.report.v1, tinkerloop.failures.v1, tinkerloop.diagnosis.v1
Check types: assistant_contains_all, assistant_contains_any, assistant_not_contains, tool_used, tool_call_count_at_most, tool_call_matches

Repo Layout

src/tinkerloop/: reusable harness engine and adapter interfaces
examples/: optional example and transition fixtures
docs/: charter, architecture, target contract, MVP plan, implementation handoff, and working agreement
tests/: Tinkerloop unit tests

Design Rules

keep the core small and inspectable
prefer deterministic checks before LLM judges
keep target-app integration behind adapters
no silent magic around tracing, patching, or scenario selection
no automatic production actions
future target-driver integrations must be non-prod only and secure by default

License

Apache License 2.0. See LICENSE. Business-friendly: use, modify, and distribute with minimal conditions; includes a patent grant.

Contributing

PRs are accepted from maintainers and invited contributors only. For bugs or ideas, open an issue. See CONTRIBUTING.md.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

bostoneco

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.7

Apr 4, 2026

0.1.6

Apr 4, 2026

This version

0.1.5

Apr 3, 2026

0.1.4

Apr 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tinkerloop_ai-0.1.5.tar.gz (20.4 kB view details)

Uploaded Apr 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tinkerloop_ai-0.1.5-py3-none-any.whl (24.8 kB view details)

Uploaded Apr 3, 2026 Python 3

File details

Details for the file tinkerloop_ai-0.1.5.tar.gz.

File metadata

Download URL: tinkerloop_ai-0.1.5.tar.gz
Upload date: Apr 3, 2026
Size: 20.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tinkerloop_ai-0.1.5.tar.gz
Algorithm	Hash digest
SHA256	`02602de5387b08c647df938d2bf09232b7a8f32e58fea17ebb88697c07acb9c4`
MD5	`6284ed7e61318775bd7611d33ea6f60f`
BLAKE2b-256	`1ec84c5ec2dfcea58624f51cd5cc37eefe2e1521878e88561f3bbb6db40d12f2`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tinkerloop_ai-0.1.5.tar.gz:

Publisher: release-wheel.yml on bostoneco/tinkerloop

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tinkerloop_ai-0.1.5.tar.gz
- Subject digest: 02602de5387b08c647df938d2bf09232b7a8f32e58fea17ebb88697c07acb9c4
- Sigstore transparency entry: 1229850025
- Sigstore integration time: Apr 3, 2026
Source repository:
- Permalink: bostoneco/tinkerloop@35b163204ebb3829da6bd3e6ae15ffd79f76b0b2
- Branch / Tag: refs/tags/v0.1.5
- Owner: https://github.com/bostoneco
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release-wheel.yml@35b163204ebb3829da6bd3e6ae15ffd79f76b0b2
- Trigger Event: release

File details

Details for the file tinkerloop_ai-0.1.5-py3-none-any.whl.

File metadata

Download URL: tinkerloop_ai-0.1.5-py3-none-any.whl
Upload date: Apr 3, 2026
Size: 24.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tinkerloop_ai-0.1.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0ed86ca84c2e44f8f7d55be5630b9ea8b86bb9eae86841b118b84dfed73db6f2`
MD5	`06cf43c63e268208e5331a19f098fabf`
BLAKE2b-256	`cef17a9e5baad8a670eea81b2db026ddc73c8974fbe0a8e74a456fcebcfec32f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tinkerloop_ai-0.1.5-py3-none-any.whl:

Publisher: release-wheel.yml on bostoneco/tinkerloop

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tinkerloop_ai-0.1.5-py3-none-any.whl
- Subject digest: 0ed86ca84c2e44f8f7d55be5630b9ea8b86bb9eae86841b118b84dfed73db6f2
- Sigstore transparency entry: 1229850060
- Sigstore integration time: Apr 3, 2026
Source repository:
- Permalink: bostoneco/tinkerloop@35b163204ebb3829da6bd3e6ae15ffd79f76b0b2
- Branch / Tag: refs/tags/v0.1.5
- Owner: https://github.com/bostoneco
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release-wheel.yml@35b163204ebb3829da6bd3e6ae15ffd79f76b0b2
- Trigger Event: release

tinkerloop-ai 0.1.5

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Tinkerloop

Release Status

What It Is

Actor Model

Who It Is For

Who It Is Not For

MVP Scope

Quick Start

Docs Map

Support Matrix

Repo Layout

Design Rules

License

Contributing

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance