Eval-driven test/fix/improve harness for orchestrator-based apps
Project description
Tinkerloop
Tinkerloop to the rescue. If your orchestrator-to-MCP app communication is hard to trust, Tinkerloop gives you a scenario-based loop to reproduce the failure, diagnose it with deterministic checks, patch the target, and rerun until the behavior matches what you expect.
It is an eval-driven harness for testing and improving orchestrator-based apps through repeatable test -> diagnose -> patch -> rerun loops.
Release Status
Tinkerloop is in alpha.
- The package, CLI, adapters, and report artifacts are usable now.
- The supported
v0.xsurface is documented indocs/STABILITY.md. - The project is intended for technically strong early adopters who can own a target adapter.
- It is not yet positioned as a benchmark suite or production-assurance layer.
What It Is
Tinkerloop is not another app-specific bot framework. It is a reusable outer loop for systems that already have:
- an inner orchestrator model
- tool or MCP integrations
- a conversational or API-facing entrypoint
Tinkerloop plays the role of:
- user simulator
- integration tester
- trajectory recorder
- deterministic judge
- developer feedback loop driver
Actor Model
There are two distinct roles in a Tinkerloop workflow:
- inner target orchestrator: the model and tool path inside the app under test
- outer coding model: the developer tool model using Tinkerloop artifacts to patch and rerun
The outer coding model may analyze results and edit code between runs.
It must not replace the inner target orchestrator during a measured run.
See docs/ACTOR_MODEL.md.
Who It Is For
- teams that already have a target app and want deterministic scenario-based regression loops
- teams that can keep target-specific logic in a target-owned adapter and scenario library
- teams that want report-driven reruns rather than broad benchmark claims
Who It Is Not For
- users looking for a zero-config app framework
- teams that need remote secure-driver support today
- users who want Tinkerloop to measure general model quality
MVP Scope
Current MVP:
- load multi-turn scenario files
- run them against a target app adapter
- preflight the target app before scenario execution
- resolve the target app's inner runtime from the target repo boundary
- trace tool calls by patching configured execution points
- trace tool calls from target-owned runner commands
- evaluate deterministic checks
- write JSON reports for failures and regressions
- rerun only failed scenarios from report artifacts
- separate repair-loop and confirmation-loop runs
Not in scope yet:
- automatic patch generation
- automatic deploys
- autonomous code changes without a human gate
- benchmark claims beyond the configured scenario set
- secure non-prod target-driver contracts
Quick Start
Tinkerloop supports Python 3.10+. This repo pins 3.12.9 in .python-version for local development with pyenv.
The PyPI distribution name is tinkerloop-ai. Install it with:
python3 -m pip install tinkerloop-ai
If you need to install directly from a GitHub release asset instead:
python3 -m pip install https://github.com/bostoneco/tinkerloop/releases/download/<tag>/tinkerloop_ai-<version>-py3-none-any.whl
Then run it against a target-owned adapter and scenario directory:
tinkerloop \
run \
--adapter /path/to/target_adapter.py:create_adapter \
--user-id <user-id> \
--scenarios /path/to/scenarios
When a candidate fix looks good, run the external confirmation loop:
tinkerloop \
confirm \
--adapter /path/to/target_adapter.py:create_adapter \
--user-id <user-id> \
--scenarios /path/to/scenarios \
--non-interactive
If your target repo exposes a more realistic runner or adapter for real-agent validation,
use that boundary for confirm instead of the faster repair-loop boundary.
For local development from a source checkout:
pyenv local 3.12.9
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
pytest -q
tinkerloop \
run \
--adapter examples/starter_target/adapter.py:create_adapter \
--user-id demo-user \
--scenarios examples/starter_target/scenarios
# fuller demo target
tinkerloop \
run \
--adapter examples/demo_app/adapter.py:create_adapter \
--user-id demo-user \
--scenarios examples/demo_app/scenarios
For real projects, the target repo should own its adapter and scenarios.
--adapter accepts either an import path such as your_project.tinkerloop_adapter:create_adapter or a file path such as /path/to/target_adapter.py:create_adapter.
For PythonAppAdapter, each patch_targets entry should point at a callable with
the standard tool-call shape
(tool_name, user_id, arguments, correlation_id=None).
Scenario files must contain at least one turn, and each turn must define a
non-empty user prompt.
If the adapter cannot resolve one inner model confidently, Tinkerloop will prompt for a repo-derived candidate in interactive mode. In non-interactive mode, pass explicit overrides:
tinkerloop \
run \
--adapter examples/demo_app/adapter.py:create_adapter \
--user-id demo-user \
--scenarios examples/demo_app/scenarios \
--inner-provider <provider> \
--inner-model <model>
Rerun only failed scenarios from report artifacts:
tinkerloop \
run \
--adapter examples/demo_app/adapter.py:create_adapter \
--user-id demo-user \
--scenarios examples/demo_app/scenarios \
--failed-from artifacts/reports
Run only a tagged feature slice:
tinkerloop \
run \
--adapter examples/demo_app/adapter.py:create_adapter \
--user-id demo-user \
--scenarios examples/demo_app/scenarios \
--tag cleanup \
--tag preview
Artifacts written on each run:
- timestamped report:
tinkerloop-<timestamp>.json - stable latest report:
latest.json - stable failure summary:
latest-failures.json - stable diagnosis payload:
latest-diagnosis.jsonincludesconfirmation_statusfor repair-loop vs confirmation-loop visibility - confirmation timestamped report:
confirm-tinkerloop-<timestamp>.json - confirmation latest report:
confirm-latest.json - confirmation failure summary:
confirm-latest-failures.json - confirmation diagnosis payload:
confirm-latest-diagnosis.json
When a repair run passes without a fresh confirmation run, Tinkerloop prints a warning and marks the repair results as provisional.
Docs Map
docs/STABILITY.md: supportedv0.xsurface and experimental boundariesdocs/ACTOR_MODEL.md: inner target orchestrator vs outer coding model rolesdocs/QUICKSTART_TARGET_REPO.md: minimal target-owned integration pathdocs/ADAPTER_GUIDE.md: when to usePythonAppAdaptervsCommandAppAdapterdocs/TRUST_MODEL.md: what a pass/fail result does and does not meandocs/TROUBLESHOOTING.md: first-run failure modesdocs/WORKED_EXAMPLE.md: failure -> diagnosis -> rerun exampledocs/WORKING_AGREEMENT.md: day-to-day run disciplinedocs/TARGET_CONTRACT.md: public integration boundary
Support Matrix
- Python:
3.10+ - Commands:
run,confirm - Adapter shapes:
PythonAppAdapter,CommandAppAdapter - Report schemas:
tinkerloop.report.v1,tinkerloop.failures.v1,tinkerloop.diagnosis.v1 - Check types:
assistant_contains_all,assistant_contains_any,assistant_not_contains,tool_used,tool_call_count_at_most,tool_call_matches
Repo Layout
src/tinkerloop/: reusable harness engine and adapter interfacesexamples/: optional example and transition fixturesdocs/: charter, architecture, target contract, MVP plan, implementation handoff, and working agreementtests/: Tinkerloop unit tests
Design Rules
- keep the core small and inspectable
- prefer deterministic checks before LLM judges
- keep target-app integration behind adapters
- no silent magic around tracing, patching, or scenario selection
- no automatic production actions
- future target-driver integrations must be non-prod only and secure by default
License
Apache License 2.0. See LICENSE. Business-friendly: use, modify, and distribute with minimal conditions; includes a patent grant.
Contributing
PRs are accepted from maintainers and invited contributors only. For bugs or ideas, open an issue. See CONTRIBUTING.md.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tinkerloop_ai-0.1.4.tar.gz.
File metadata
- Download URL: tinkerloop_ai-0.1.4.tar.gz
- Upload date:
- Size: 20.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c69a3fe532fe9aae3f75b9ba3fa21d7a481d3bd6792bbd88eeb0805818c97c16
|
|
| MD5 |
d7084e083e3c60fa239edc23105c00fa
|
|
| BLAKE2b-256 |
70189a61300b1d1678581eadfd1a2c30d8cbeb2120a3cb4af9b0f7649a522214
|
Provenance
The following attestation bundles were made for tinkerloop_ai-0.1.4.tar.gz:
Publisher:
release-wheel.yml on bostoneco/tinkerloop
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tinkerloop_ai-0.1.4.tar.gz -
Subject digest:
c69a3fe532fe9aae3f75b9ba3fa21d7a481d3bd6792bbd88eeb0805818c97c16 - Sigstore transparency entry: 1229637802
- Sigstore integration time:
-
Permalink:
bostoneco/tinkerloop@dc10ec8023e938025c6c51f5b55d73a04b225f32 -
Branch / Tag:
refs/tags/v0.1.4 - Owner: https://github.com/bostoneco
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-wheel.yml@dc10ec8023e938025c6c51f5b55d73a04b225f32 -
Trigger Event:
release
-
Statement type:
File details
Details for the file tinkerloop_ai-0.1.4-py3-none-any.whl.
File metadata
- Download URL: tinkerloop_ai-0.1.4-py3-none-any.whl
- Upload date:
- Size: 24.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ae24527724d04da56e5ff0b910c980c4b6f897270c7e966fde99825e02e29965
|
|
| MD5 |
abed7ce2e993090a4f22d20f7d6abd77
|
|
| BLAKE2b-256 |
0163bb6ebeee3fe578855cda8005cd93732a32613dfbd4bf7b7d31c6f27b80b9
|
Provenance
The following attestation bundles were made for tinkerloop_ai-0.1.4-py3-none-any.whl:
Publisher:
release-wheel.yml on bostoneco/tinkerloop
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tinkerloop_ai-0.1.4-py3-none-any.whl -
Subject digest:
ae24527724d04da56e5ff0b910c980c4b6f897270c7e966fde99825e02e29965 - Sigstore transparency entry: 1229637830
- Sigstore integration time:
-
Permalink:
bostoneco/tinkerloop@dc10ec8023e938025c6c51f5b55d73a04b225f32 -
Branch / Tag:
refs/tags/v0.1.4 - Owner: https://github.com/bostoneco
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-wheel.yml@dc10ec8023e938025c6c51f5b55d73a04b225f32 -
Trigger Event:
release
-
Statement type: