Skip to main content

Local-first repair loop for debugging and improving AI agents.

Project description

Kyoko

GitHub stars Kayba Website Discord Twitter Follow Python 3.12+ License: Apache-2.0

Kyoko is the all-in-one, fully local tool for debugging and improving your AI agents.

Point it at any agent you're building (instrument it with OpenTelemetry or the SDKs), or plug straight into CLI agents you already run like Codex, Claude Code, OpenClaw, and Hermes. Kyoko captures what your agent actually does and runs a closed repair loop over it: it analyses real runs into a living state reflection of the system, files recurring and generalised failures as issues, drafts concrete fixes, and proves them with replay and evals before anything ships. Everything runs on your machine (traces, database, and dashboard), and any model or external call is opt-in.

Most agent tooling stops at showing you traces; you still have to read them, guess what went wrong, write the fix, and hope it didn't break something else. Kyoko closes that gap end to end, in one place.

That state reflection is cumulative: Kyoko keeps learning from traces, issues, fixes, replays, and evals, so it can surface the problems humans would not think to measure by hand while still respecting the detectors and judges you explicitly choose.

Kyoko dashboard overview

Why Kyoko

  • OpenTelemetry-native. Ingests OTLP/GenAI spans; SDKs and importers for the rest.
  • Runs on your coding agent. Codex, Claude Code, OpenClaw, Hermes do the analysis and author fixes through their own CLI login, so no API keys and no extra spend.
  • Fully local. SQLite + loopback UI. Nothing leaves your machine; external calls opt-in.
  • Cumulative analysis. Builds a state reflection from traces, issues, evals, and fixes, so repeated behavior becomes more accurate fixes over time.
  • Measured, not guessed. Failure rate from real evals, not status flags.
  • Safe by default. No change ships without passing the gate. No shortcuts, anywhere.
  • Zero-fuss. One kyoko CLI, near-zero deps, --json everywhere. No server, no cloud.

The loop

        ┌─────────────────┐           ┌─────────────────┐
        │  1. Analyse     │ ─────-──▶ │  2. Issues      │
        │  traces in      │           │  recurring      │
        │                 │           │  failures       │
        └─────────────────┘           └─────────────────┘
                 ▲                            │
                 │ measure                    │ accept
                 │                            ▼
        ┌─────────────────┐  ┌──────┐ ┌─────────────────┐
        │  4. Evals       │◀-┤ gate ├─│  3. Proposals   │
        │  failure rate   │  └──────┘ │  candidate      │
        │                 │   apply   │  fixes          │
        └─────────────────┘           └─────────────────┘

   Gate = checks · replay · policy · locks; a fix applies only if it passes.
   Evals score the result and feed the next analysis; the loop tightens.
  1. Analyse: Kyoko reads your agent's traces for you, diagnoses what went wrong, and updates a state reflection of how the system behaves over time. No manual log-digging.
  2. Issues: it surfaces the failures to you automatically as first-class, evidence-backed issues, grouped by category and severity so you fix the pattern, not the symptom, including problems you did not predefine as a metric.
  3. Proposals: each accepted issue becomes a concrete fix (to context/skills or the agent's harness), then runs the gate: generated checks, bounded replay, autonomy policy, and human locks. It applies only if it passes.
  4. Evals: a measurement plane of deterministic detectors and LLM judges scores runs into a failure rate, before vs after. Failure is decided by evals, never by a status flag on a trace.

Run it your way. The same loop, the same gate. You pick the autonomy level:

  • Human-in-the-loop: Kyoko surfaces issues and drafts fixes, and you review and approve each change before it applies.
  • Fully autonomous: the policy auto-applies any change that clears replay, evals, and human locks, and parks anything that doesn't for you to look at.

Either way, nothing behavior-changing ships without passing the gate.

Kyoko issues review queue

Quick demo

Kyoko requires Python 3.12 or newer. From this checkout:

python3 -m pip install .
kyoko demo --db /tmp/kyoko-demo.db --json
kyoko serve --db /tmp/kyoko-demo.db

Open http://127.0.0.1:8765.

The demo runs the full loop against bundled fixture data, so it needs no live model, framework adapter, or replay server.

Install

git clone https://github.com/kayba-ai/kyoko.git
cd kyoko
python3 -m pip install .

After the package is published, prefer an isolated CLI install:

pipx install kyoko

See docs/INSTALL.md for uv, editable installs, the installer script, upgrades, and common setup fixes.

Use it in your project

Run this from the root of an agent project:

kyoko project-bootstrap \
  --project-dir . \
  --profile-name my-agent \
  --source-framework generic-python \
  --replay-framework generic-python \
  --mcp-target codex

project-bootstrap writes .kyoko/kyoko.db, source/replay scaffolds, MCP config, operator presets, and .kyoko/NEXT_STEPS.md. Then check readiness and start the dashboard:

kyoko doctor --db .kyoko/kyoko.db --safe-smokes --json
kyoko serve --db .kyoko/kyoko.db

Point telemetry at Kyoko with the Python or TypeScript SDK, a generated adapter, or an importer. See Getting Started for the end-to-end walkthrough.

What you get

  • Telemetry in: Python SDK, TypeScript SDK, generated source adapters, OTLP/GenAI JSON, Hermes import, OpenClaw import.
  • Diagnosis: per-trace and cumulative analysis that folds behavior into a state reflection, then turns recurring or generalised weaknesses into evidence-backed issues with category, severity, and the spans where they happened.
  • Fixes out: issues become validated LearningProposal records, authored by you or an operator agent (Codex, Claude, or a generic command).
  • Verification: generated checks plus bounded replay against external commands or managed loopback replay servers.
  • Measurement: an evidence-only eval plane (deterministic detectors and LLM-judge evals) for what you choose to measure, alongside analysis that surfaces unmeasured patterns from observed behavior.
  • Surfaces: a local dashboard, a JSON-everywhere CLI, and a stdio MCP server for coding agents, all sharing the same gated apply path.
Area Supported paths
Source telemetry Python SDK, TypeScript SDK, generated source adapters, OTLP/GenAI JSON, Hermes import, OpenClaw import
Replay External replay commands, managed HTTP replay servers, generated replay scaffolds
Operator agents Codex, Claude, generic command adapters, local presets
Agent clients Dashboard, JSON CLI, stdio MCP server
Framework scaffolds Generic Python/TypeScript, LangGraph, Pydantic AI, OpenAI Agents, CrewAI, Hermes, OpenClaw, AI SDK

See docs/INTEGRATIONS.md and examples/README.md.

How safety works

Every behavior-changing path (operator output, imports, MCP tools, and kyoko improve) flows through one gate:

  1. Validate the proposal against its schema.
  2. Resolve the evidence it references.
  3. Generate or select checks.
  4. Run bounded replay and the checks.
  5. Evaluate the autonomy policy.
  6. Enforce human locks on protected targets.
  7. Apply context or harness changes only if the gate allows it.

Context writes update Kyoko-managed skills and delivery rules; harness writes create reviewable patch transactions against an explicit workspace root. Replay server URLs are loopback-only unless you pass --allow-remote-server, and evidence exported to prompts, MCP, API, or bundles is redacted by default. See docs/SECURITY.md and docs/ARCHITECTURE.md.

Documentation

  • Getting Started: demo, project bootstrap, telemetry, inspection, and the repair loop.
  • Install: install paths, verification, data location, and common setup fixes.
  • Integrations: source adapters, replay adapters, operator agents, MCP, and SDKs.
  • CLI Reference: grouped command reference.
  • Architecture: runtime model, data model, and the gate.
  • Security: local data, loopback serving, tokens, redaction, and write boundaries.
  • Scope: what v0 is and is not.
  • Development: tests, dashboard bundle, release smoke, and contract artifacts.

Specs, schemas, fixtures, and design decisions live under docs/ as reference contracts.

Contributing

Issues and pull requests are welcome. See CONTRIBUTING.md for local setup, the test and validation gates, and how to submit a change. To report a security vulnerability, follow SECURITY.md rather than opening a public issue.

Repository layout

kyoko/              Python import package, CLI runtime, dashboard/API, bundled assets
frontend/           React/Vite dashboard source
sdk/typescript/     Dependency-free TypeScript telemetry SDK
examples/           Source and replay hook examples
scripts/            Installer, release smoke, fixture and artifact helpers
tests/              Python unittest suite and CLI contract tests
docs/               User docs plus specs, schemas, fixtures, and decisions

License

Apache-2.0. See LICENSE.


Built by Kayba and the open-source community.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kyoko-0.1.0.tar.gz (2.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kyoko-0.1.0-py3-none-any.whl (2.2 MB view details)

Uploaded Python 3

File details

Details for the file kyoko-0.1.0.tar.gz.

File metadata

  • Download URL: kyoko-0.1.0.tar.gz
  • Upload date:
  • Size: 2.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for kyoko-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1576d4de9f824b555a347116654933c32c7da357acc1283108fbc873bd996c66
MD5 6f05d8da33a6faff9c6318578c4c79d5
BLAKE2b-256 e74ae346f273b4d0f7af176a8f1d3f5b904f01b094a90349a267344653c7a843

See more details on using hashes here.

Provenance

The following attestation bundles were made for kyoko-0.1.0.tar.gz:

Publisher: release.yml on kayba-ai/Kyoko

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kyoko-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: kyoko-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 2.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for kyoko-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dce592f4462c143ea54147289ccde85f406ae2bffac96eac5ae0fdb2bd13f32b
MD5 382fd26e44ae28b01ec37642d5f65bdb
BLAKE2b-256 0ffe0d6e1061cd8b4d0c7abee11d6deb6c660aa49d18d180fc27007e1be20313

See more details on using hashes here.

Provenance

The following attestation bundles were made for kyoko-0.1.0-py3-none-any.whl:

Publisher: release.yml on kayba-ai/Kyoko

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page