Local-first repair loop for debugging and improving AI agents.
Project description
Kyoko
Kyoko is the all-in-one, fully local tool for debugging and improving your AI agents.
Point it at any agent you're building (instrument it with OpenTelemetry or the SDKs), or plug straight into CLI agents you already run like Codex, Claude Code, OpenClaw, and Hermes. Kyoko captures what your agent actually does and runs a closed repair loop over it: it analyses real runs into a living state reflection of the system, files recurring and generalised failures as issues, drafts concrete fixes, and proves them with replay and evals before anything ships. Everything runs on your machine (traces, database, and dashboard), and any model or external call is opt-in.
Most agent tooling stops at showing you traces; you still have to read them, guess what went wrong, write the fix, and hope it didn't break something else. Kyoko closes that gap end to end, in one place.
That state reflection is cumulative: Kyoko keeps learning from traces, issues, fixes, replays, and evals, so it can surface the problems humans would not think to measure by hand while still respecting the detectors and judges you explicitly choose.
Why Kyoko
- OpenTelemetry-native. Ingests OTLP/GenAI spans; SDKs and importers for the rest.
- Runs on your coding agent. Codex, Claude Code, OpenClaw, Hermes do the analysis and author fixes through their own CLI login, so no API keys and no extra spend.
- Fully local. SQLite + loopback UI. Nothing leaves your machine; external calls opt-in.
- Cumulative analysis. Builds a state reflection from traces, issues, evals, and fixes, so repeated behavior becomes more accurate fixes over time.
- Measured, not guessed. Failure rate from real evals, not status flags.
- Safe by default. No change ships without passing the gate. No shortcuts, anywhere.
- Zero-fuss. One
kyokoCLI, near-zero deps,--jsoneverywhere. No server, no cloud.
The loop
┌─────────────────┐ ┌─────────────────┐
│ 1. Analyse │ ─────-──▶ │ 2. Issues │
│ traces in │ │ recurring │
│ │ │ failures │
└─────────────────┘ └─────────────────┘
▲ │
│ measure │ accept
│ ▼
┌─────────────────┐ ┌──────┐ ┌─────────────────┐
│ 4. Evals │◀-┤ gate ├─│ 3. Proposals │
│ failure rate │ └──────┘ │ candidate │
│ │ apply │ fixes │
└─────────────────┘ └─────────────────┘
Gate = checks · replay · policy · locks; a fix applies only if it passes.
Evals score the result and feed the next analysis; the loop tightens.
- Analyse: Kyoko reads your agent's traces for you, diagnoses what went wrong, and updates a state reflection of how the system behaves over time. No manual log-digging.
- Issues: it surfaces the failures to you automatically as first-class, evidence-backed issues, grouped by category and severity so you fix the pattern, not the symptom, including problems you did not predefine as a metric.
- Proposals: each accepted issue becomes a concrete fix (to context/skills or the agent's harness), then runs the gate: generated checks, bounded replay, autonomy policy, and human locks. It applies only if it passes.
- Evals: a measurement plane of deterministic detectors and LLM judges scores runs into a failure rate, before vs after. Failure is decided by evals, never by a status flag on a trace.
Run it your way. The same loop, the same gate. You pick the autonomy level:
- Human-in-the-loop: Kyoko surfaces issues and drafts fixes, and you review and approve each change before it applies.
- Fully autonomous: the policy auto-applies any change that clears replay, evals, and human locks, and parks anything that doesn't for you to look at.
Either way, nothing behavior-changing ships without passing the gate.
Quick demo
Kyoko requires Python 3.12 or newer. From this checkout:
python3 -m pip install .
kyoko demo --db /tmp/kyoko-demo.db --json
kyoko serve --db /tmp/kyoko-demo.db
Open http://127.0.0.1:8765.
The demo runs the full loop against bundled fixture data, so it needs no live model, framework adapter, or replay server.
Install
git clone https://github.com/kayba-ai/kyoko.git
cd kyoko
python3 -m pip install .
After the package is published, prefer an isolated CLI install:
pipx install kyoko
See docs/INSTALL.md for uv, editable installs, the
installer script, upgrades, and common setup fixes.
Use it in your project
Run this from the root of an agent project:
kyoko project-bootstrap \
--project-dir . \
--profile-name my-agent \
--source-framework generic-python \
--replay-framework generic-python \
--mcp-target codex
project-bootstrap writes .kyoko/kyoko.db, source/replay scaffolds, MCP
config, operator presets, and .kyoko/NEXT_STEPS.md. Then check readiness and
start the dashboard:
kyoko doctor --db .kyoko/kyoko.db --safe-smokes --json
kyoko serve --db .kyoko/kyoko.db
Point telemetry at Kyoko with the Python or TypeScript SDK, a generated adapter, or an importer. See Getting Started for the end-to-end walkthrough.
What you get
- Telemetry in: Python SDK, TypeScript SDK, generated source adapters, OTLP/GenAI JSON, Hermes import, OpenClaw import.
- Diagnosis: per-trace and cumulative analysis that folds behavior into a state reflection, then turns recurring or generalised weaknesses into evidence-backed issues with category, severity, and the spans where they happened.
- Fixes out: issues become validated
LearningProposalrecords, authored by you or an operator agent (Codex, Claude, or a generic command). - Verification: generated checks plus bounded replay against external commands or managed loopback replay servers.
- Measurement: an evidence-only eval plane (deterministic detectors and LLM-judge evals) for what you choose to measure, alongside analysis that surfaces unmeasured patterns from observed behavior.
- Surfaces: a local dashboard, a JSON-everywhere CLI, and a stdio MCP server for coding agents, all sharing the same gated apply path.
| Area | Supported paths |
|---|---|
| Source telemetry | Python SDK, TypeScript SDK, generated source adapters, OTLP/GenAI JSON, Hermes import, OpenClaw import |
| Replay | External replay commands, managed HTTP replay servers, generated replay scaffolds |
| Operator agents | Codex, Claude, generic command adapters, local presets |
| Agent clients | Dashboard, JSON CLI, stdio MCP server |
| Framework scaffolds | Generic Python/TypeScript, LangGraph, Pydantic AI, OpenAI Agents, CrewAI, Hermes, OpenClaw, AI SDK |
See docs/INTEGRATIONS.md and examples/README.md.
How safety works
Every behavior-changing path (operator output, imports, MCP tools, and
kyoko improve) flows through one gate:
- Validate the proposal against its schema.
- Resolve the evidence it references.
- Generate or select checks.
- Run bounded replay and the checks.
- Evaluate the autonomy policy.
- Enforce human locks on protected targets.
- Apply context or harness changes only if the gate allows it.
Context writes update Kyoko-managed skills and delivery rules; harness writes
create reviewable patch transactions against an explicit workspace root.
Replay server URLs are loopback-only unless you pass --allow-remote-server,
and evidence exported to prompts, MCP, API, or bundles is redacted by default.
See docs/SECURITY.md and docs/ARCHITECTURE.md.
Documentation
- Getting Started: demo, project bootstrap, telemetry, inspection, and the repair loop.
- Install: install paths, verification, data location, and common setup fixes.
- Integrations: source adapters, replay adapters, operator agents, MCP, and SDKs.
- CLI Reference: grouped command reference.
- Architecture: runtime model, data model, and the gate.
- Security: local data, loopback serving, tokens, redaction, and write boundaries.
- Scope: what v0 is and is not.
- Development: tests, dashboard bundle, release smoke, and contract artifacts.
Specs, schemas, fixtures, and design decisions live under docs/ as reference
contracts.
Contributing
Issues and pull requests are welcome. See CONTRIBUTING.md for local setup, the test and validation gates, and how to submit a change. To report a security vulnerability, follow SECURITY.md rather than opening a public issue.
Repository layout
kyoko/ Python import package, CLI runtime, dashboard/API, bundled assets
frontend/ React/Vite dashboard source
sdk/typescript/ Dependency-free TypeScript telemetry SDK
examples/ Source and replay hook examples
scripts/ Installer, release smoke, fixture and artifact helpers
tests/ Python unittest suite and CLI contract tests
docs/ User docs plus specs, schemas, fixtures, and decisions
License
Apache-2.0. See LICENSE.
Built by Kayba and the open-source community.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kyoko-0.1.0.tar.gz.
File metadata
- Download URL: kyoko-0.1.0.tar.gz
- Upload date:
- Size: 2.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1576d4de9f824b555a347116654933c32c7da357acc1283108fbc873bd996c66
|
|
| MD5 |
6f05d8da33a6faff9c6318578c4c79d5
|
|
| BLAKE2b-256 |
e74ae346f273b4d0f7af176a8f1d3f5b904f01b094a90349a267344653c7a843
|
Provenance
The following attestation bundles were made for kyoko-0.1.0.tar.gz:
Publisher:
release.yml on kayba-ai/Kyoko
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
kyoko-0.1.0.tar.gz -
Subject digest:
1576d4de9f824b555a347116654933c32c7da357acc1283108fbc873bd996c66 - Sigstore transparency entry: 1758614329
- Sigstore integration time:
-
Permalink:
kayba-ai/Kyoko@a69091fa67a8d3cca640b98ca6b61fc65968af52 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/kayba-ai
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@a69091fa67a8d3cca640b98ca6b61fc65968af52 -
Trigger Event:
push
-
Statement type:
File details
Details for the file kyoko-0.1.0-py3-none-any.whl.
File metadata
- Download URL: kyoko-0.1.0-py3-none-any.whl
- Upload date:
- Size: 2.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dce592f4462c143ea54147289ccde85f406ae2bffac96eac5ae0fdb2bd13f32b
|
|
| MD5 |
382fd26e44ae28b01ec37642d5f65bdb
|
|
| BLAKE2b-256 |
0ffe0d6e1061cd8b4d0c7abee11d6deb6c660aa49d18d180fc27007e1be20313
|
Provenance
The following attestation bundles were made for kyoko-0.1.0-py3-none-any.whl:
Publisher:
release.yml on kayba-ai/Kyoko
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
kyoko-0.1.0-py3-none-any.whl -
Subject digest:
dce592f4462c143ea54147289ccde85f406ae2bffac96eac5ae0fdb2bd13f32b - Sigstore transparency entry: 1758614398
- Sigstore integration time:
-
Permalink:
kayba-ai/Kyoko@a69091fa67a8d3cca640b98ca6b61fc65968af52 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/kayba-ai
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@a69091fa67a8d3cca640b98ca6b61fc65968af52 -
Trigger Event:
push
-
Statement type: