kensa

The open source agent evals harness

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

satyaborg

These details have not been verified by PyPI

Project links

Homepage

Project description

kensa - the open source agent evals harness

Kensa is the open source harness for evaluating agents.

Agents are non-deterministic. Prompts drift. Tools change. Models behave differently. Any change can make them slower, more expensive, or just plain unreliable.

kensa gives coding agents, like Claude Code, a repeatable loop to eval your agents, and catch regressions every time you make a change.

Installation

Run these from the root of your Python agent repo.

Paste this into your coding agent

Open your coding agent and paste:

Run `uvx kensa init --cli --agent all`, then use the audit-evals skill and
follow the eval lifecycle.

The agent installs the CLI, scaffolds .kensa/, drops in the five skills, and runs your first eval. Works with Claude Code, Codex, Cursor, OpenCode, and Gemini CLI.

Or install yourself, then ask your agent

If you want to control the install step but still let your coding agent drive the eval workflow:

uvx kensa init --cli --agent all

Then in Claude Code, Codex, Cursor, OpenCode, or Gemini CLI:

> /audit-evals

The skill captures a real run, generates scenarios, runs evals, and reports back.

Or CLI-only

If you want to skip the coding-agent loop entirely and drive kensa as a regular CLI:

uvx kensa init                                       # dev dep + bare .kensa/
kensa capture -i "<example input>" -- <your agent>   # record one real run as a trace
kensa generate                                       # synthesize scenarios from the capture
kensa eval                                           # run + judge + report

Or Claude Code plugin

If you primarily use Claude Code, install via the plugin marketplace:

/plugin marketplace add satyaborg/kensa
/plugin install kensa

Quickstart

Tell your coding agent what you want:

You say	Kensa does
"Evaluate this agent"	Audit setup, create or reuse scenarios, and run evals.
"Why are evals failing?"	Inspect results and traces, then diagnose the root cause.
"Add coverage for tool use"	Write scenario YAML with tool or trajectory checks.
"The judge seems wrong"	Create or validate structured judge prompts.

How it works

Zero to eval: your coding agent drafts scenarios; you review them.
Runs become traces: each scenario runs in a subprocess with LLM calls, tool use, tokens, cost, and latency captured.
Checks gate judges: deterministic checks run before any LLM judge call.
Ship with evidence: reports show verdicts, traces, cost, latency, and failure details.

Instrumentation

Zero code changes. kensa captures LLM calls, tool use, tokens, cost, and latency without modifying your agent. OpenTelemetry (OTel) compatible.

Provider extras

uv add "kensa[anthropic]"
uv add "kensa[openai]"
uv add "kensa[langchain]"
uv add "kensa[all]"

Core commands

Command	What it does
`kensa init`	Scaffold `.kensa/` (bare; pass `--example` for a demo agent + scenario)
`kensa capture -i "<input>" -- <cmd>`	Record one real agent run as a trace
`kensa doctor`	Check instrumentation, config, and environment readiness
`kensa generate`	Synthesize scenario YAMLs from captured traces via an LLM
`kensa eval`	Run + judge + report in one command
`kensa report`	Show the latest results in terminal, Markdown, JSON, or HTML

See the CLI docs for run, judge, analyze, mcp, and the full command reference.

MCP server

One-liner for Claude Code (run from your project root):

claude mcp add kensa -- uvx kensa-mcp

For other JSON-based MCP clients, add to your project's .mcp.json or .cursor/mcp.json:

{
  "mcpServers": {
    "kensa": {
      "command": "uvx",
      "args": ["kensa-mcp"]
    }
  }
}

For Codex, add to your project-scoped .codex/config.toml:

[mcp_servers.kensa]
command = "uvx"
args = ["kensa-mcp"]

See the MCP server docs for tools, resources, and manual config.

Manual workflow

If you want to author evals yourself:

kensa init
kensa doctor

Scenarios live in .kensa/scenarios/*.yaml and point at your agent entrypoint with run_command.

id: classify_ticket
input: "Our entire team can't log in. SSO has returned 502 since 7am."
run_command: [python, agent.py]   # input is appended as the final argv element

checks:
  - type: trajectory
    params:
      steps:
        - tool: classify_ticket
      max_steps: 1
      max_tokens: 2000
  - type: output_matches
    params: { pattern: "^P[123]$" }

criteria: |
  P1 is for outages or data loss affecting multiple users.

For complete examples, see examples/. See the scenario docs and checks docs for the full field and check reference.

CI

- name: Run evals
  run: uv run kensa eval --format markdown

If you only use deterministic checks, you do not need API keys. If you use criteria or judge, add judge LLM provider secrets in CI.

Need more?

Docs
examples/ has sample agents and scenarios
CONTRIBUTING.md covers local development
Homepage

License

MIT License

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

satyaborg

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.8.0

May 5, 2026

0.7.0

May 1, 2026

0.6.2

Apr 27, 2026

0.6.1

Apr 24, 2026

0.6.0

Apr 24, 2026

0.5.2

Apr 18, 2026

0.5.1

Apr 18, 2026

0.5.0

Apr 15, 2026

0.4.0

Apr 13, 2026

0.3.0

Apr 10, 2026

0.2.0

Apr 8, 2026

0.1.0

Apr 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kensa-0.8.0.tar.gz (105.6 kB view details)

Uploaded May 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

kensa-0.8.0-py3-none-any.whl (120.6 kB view details)

Uploaded May 5, 2026 Python 3

File details

Details for the file kensa-0.8.0.tar.gz.

File metadata

Download URL: kensa-0.8.0.tar.gz
Upload date: May 5, 2026
Size: 105.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for kensa-0.8.0.tar.gz
Algorithm	Hash digest
SHA256	`f78ff911bdaf369688f1c93a3d84cbc66114869185ca95411f3813f2235158b2`
MD5	`4fcbb1e0a4bd72bee97c6eceba754b47`
BLAKE2b-256	`d6f1ed293156756f9aaf0a54917a5f72fe49e448b9fcbff0909f3662741bf292`

See more details on using hashes here.

Provenance

The following attestation bundles were made for kensa-0.8.0.tar.gz:

Publisher: release.yml on satyaborg/kensa

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: kensa-0.8.0.tar.gz
- Subject digest: f78ff911bdaf369688f1c93a3d84cbc66114869185ca95411f3813f2235158b2
- Sigstore transparency entry: 1439096590
- Sigstore integration time: May 5, 2026
Source repository:
- Permalink: satyaborg/kensa@dc1cadf19b86505ba80178acf71db74e395083d6
- Branch / Tag: refs/tags/v0.8.0
- Owner: https://github.com/satyaborg
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@dc1cadf19b86505ba80178acf71db74e395083d6
- Trigger Event: push

File details

Details for the file kensa-0.8.0-py3-none-any.whl.

File metadata

Download URL: kensa-0.8.0-py3-none-any.whl
Upload date: May 5, 2026
Size: 120.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for kensa-0.8.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b5a600caa35d3125530fd47ef61f572fb5667684974920e60052bd4f52eba511`
MD5	`50e11d74901475a344869064db62d72d`
BLAKE2b-256	`840010c23d36746bbde16b16a33b2d1bce9e41caf9dbd65b7aa70452981509d8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for kensa-0.8.0-py3-none-any.whl:

Publisher: release.yml on satyaborg/kensa

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: kensa-0.8.0-py3-none-any.whl
- Subject digest: b5a600caa35d3125530fd47ef61f572fb5667684974920e60052bd4f52eba511
- Sigstore transparency entry: 1439096614
- Sigstore integration time: May 5, 2026
Source repository:
- Permalink: satyaborg/kensa@dc1cadf19b86505ba80178acf71db74e395083d6
- Branch / Tag: refs/tags/v0.8.0
- Owner: https://github.com/satyaborg
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@dc1cadf19b86505ba80178acf71db74e395083d6
- Trigger Event: push

kensa 0.8.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Installation

Paste this into your coding agent

Or install yourself, then ask your agent

Or CLI-only

Or Claude Code plugin

Quickstart

How it works

Instrumentation

Core commands

MCP server

Manual workflow

CI

Need more?

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance