Skip to main content

The open source agent evals harness

Project description


kensa - the open source agent evals harness

Tell your coding agent to evaluate an agent. Get a working eval suite in minutes.

CI PyPI Python License


kensa is an open source eval harness for agent codebases. It gives coding agents an opinionated CLI and bundled skills to generate scenarios, run them in subprocesses, judge results, and report failures.

Installation

Skills + CLI (recommended)

npx skills add satyaborg/kensa
uv add kensa

Works for Claude Code, Codex, Cursor, OpenCode, Gemini CLI, and similar coding agents.

Claude Code plugin

If you primarily use Claude Code, you can install it as a plugin:

/plugin marketplace add satyaborg/kensa
/plugin install kensa

Quickstart

Tell your coding agent:

evaluate this agent

That gives you the basic loop:

  • your coding agent inspects the repo, sets up instrumentation and writes evals
  • it runs kensa to execute scenarios and capture traces
  • deterministic checks run first
  • the LLM judge only runs when those pass
  • reports show what failed and why
  • you review changes, approve fixes and iterate

If instrumentation is missing

Add instrument() before importing your LLM SDK:

from kensa import instrument

instrument()

If you use the bundled skills, your coding agent will usually add this for you.

Provider extras
uv add "kensa[anthropic]"
uv add "kensa[openai]"
uv add "kensa[langchain]"
uv add "kensa[all]"

Core commands

Command What it does
kensa init --blank Scaffold .kensa/ without example content
kensa doctor Check instrumentation, config, and environment readiness
kensa eval Run + judge + report in one command
kensa report Show the latest results in terminal, Markdown, JSON, or HTML
kensa analyze Flag slow, expensive, flaky, or error-prone traces

Manual workflow

If you want to author evals yourself:

kensa init --blank
kensa doctor

Scenarios live in .kensa/scenarios/*.yaml and point at your agent entrypoint with run_command.

id: classify_ticket
input: "Our entire team can't log in. SSO has returned 502 since 7am."
run_command: [python, agent.py]   # input is appended as the final argv element

checks:
  - type: output_matches
    params: { pattern: "^P[123]$" }

criteria: |
  P1 is for outages or data loss affecting multiple users.

For complete examples, see examples/.

CI

- name: Run evals
  run: uv run kensa eval --format markdown

If you only use deterministic checks, you do not need API keys. If you use criteria or judge, add judge provider secrets in CI.

Need more?

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kensa-0.3.0.tar.gz (50.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kensa-0.3.0-py3-none-any.whl (53.2 kB view details)

Uploaded Python 3

File details

Details for the file kensa-0.3.0.tar.gz.

File metadata

  • Download URL: kensa-0.3.0.tar.gz
  • Upload date:
  • Size: 50.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for kensa-0.3.0.tar.gz
Algorithm Hash digest
SHA256 60dec0d8c77032c4919544bb4dadf7dd1680842a4e1eb2fa0cca76d84ca05b28
MD5 c53ce55aea992600147981abd28571f4
BLAKE2b-256 a4ac9caf321503bef58af7919550822d61806eaa313477079139f8f94f3beb53

See more details on using hashes here.

Provenance

The following attestation bundles were made for kensa-0.3.0.tar.gz:

Publisher: release.yml on satyaborg/kensa

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kensa-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: kensa-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 53.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for kensa-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 71234eca24e9304c7a9096c00c0967fae4c006e1ea4f33f639dd0189e97ae02c
MD5 b045359d2234a9a0fc945549f8ada778
BLAKE2b-256 0fe9f6e99bb1c53d7283a6f19746cc27efe8d4f58fdc17bbc3fb833511c24a5d

See more details on using hashes here.

Provenance

The following attestation bundles were made for kensa-0.3.0-py3-none-any.whl:

Publisher: release.yml on satyaborg/kensa

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page