kalibra

Regression detection and CI quality gates for AI agents.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

khan5v

These details have not been verified by PyPI

Project description

Kalibra
The diff tool for AI agent runs.
The CLI that catches what the dashboard misses.

Kalibra catching a hidden regression — success rate flat at 80%, but 2 task types regressed

Success rate: 80% → 80%. Duration: flat. Tokens: flat. Everything looks the same — but 2 task types that always passed started failing, and 2 that always failed started passing. The aggregate hid it. The per-task breakdown caught it.

"Unsuccessful AI products almost always share a common root cause: a failure to create robust evaluation systems." — Hamel Husain, Your AI Product Needs Evals

pip install kalibra
kalibra compare baseline.jsonl current.jsonl -v
kalibra demo    # try it with sample data

Who this is for

Teams running agent evals in CI who want a regression gate without adopting a dashboard
Anyone who's been burned by averages hiding regressions
Developers who prefer a CLI and a config file over another UI to log into

What it doesn't do

Not a tracing backend. It reads Phoenix, OTel GenAI, Langfuse, and flat JSONL exports.
Not a dashboard. Output is terminal text, markdown, or JSON.
Not an LLM judge. No model calls, no API keys, no evaluator prompts.
Doesn't replace Phoenix or Langfuse. It compares the traces they produce.

What it does

Statistically transparent — two-proportion z-test on rates, percentile bootstrap (n=1000) on continuous metrics. Every number has a named method behind it.
Quality gates — regressions <= 2 fails your CI pipeline (exit 1) when thresholds are violated
Per-task and per-span breakdown — catches regressions that cancel out in the aggregate
Two dependencies — click + pyyaml. No ML frameworks, no API keys, no LLM calls

Quality gates for CI

# kalibra.yml
baseline:
  path: ./baselines/production.jsonl
current:
  path: ./eval-output/canary.jsonl

require:
  - success_rate_delta >= -2     # max 2pp success rate drop
  - regressions <= 5             # max 5 tasks regressed
  - cost_delta_pct <= 20         # max 20% cost increase

kalibra compare        # reads kalibra.yml, exits 1 on failure

GitHub Actions

- uses: khan5v/kalibra-action@v1
  with:
    baseline: baselines/production.jsonl
    current: current.jsonl
    config: kalibra.yml

Posts a markdown report as a PR comment. Exits 1 on gate failure.

Full workflow example

name: Agent Quality Gate
on: [pull_request]

jobs:
  kalibra:
    runs-on: ubuntu-latest
    permissions:
      pull-requests: write
    steps:
      - uses: actions/checkout@v5
      - run: python eval.py --output current.jsonl
      - uses: khan5v/kalibra-action@v1
        with:
          baseline: baselines/production.jsonl
          current: current.jsonl
          config: kalibra.yml

Integrations

Kalibra auto-detects trace formats. Each tutorial works without an API key.

Integration	Trace format	Demo scenario
Phoenix / OpenInference	`llm.`, `openinference.`	Multi-step agent with span tree aggregation
OTel GenAI	`gen_ai.*`	Truncation regression hidden by aggregate improvement
CrewAI	Flat JSONL	Failure redistribution and cost explosion

Filtering with where

Split a single trace file into populations using Prometheus-style matchers:

sources:
  baseline:
    path: ./traces.jsonl
    where:
      - variant == baseline
  current:
    path: ./traces.jsonl
    where:
      - variant == current

Operators: == (equal), != (not equal), =~ (regex match), !~ (regex not match). Multiple matchers are ANDed. Traces missing the field are excluded.

Field mapping

Kalibra works with any JSONL shape. Map your fields in config or on the command line:

fields:
  outcome: metadata.result
  cost: agent_cost.total_cost
  task_id: metadata.task_name

kalibra compare a.jsonl b.jsonl --outcome metadata.result --cost usage.total_cost

Override fields per source for different schemas:

baseline:
  path: ./langfuse.jsonl
  fields: { outcome: metadata.result, cost: usage.total_cost }
current:
  path: ./braintrust.jsonl
  fields: { outcome: scores.correctness, cost: metrics.cost }

Python API

from kalibra.loader import load_traces
from kalibra.engine import compare
from kalibra.renderers import render

baseline = load_traces("baseline.jsonl")
current = load_traces("current.jsonl")

result = compare(baseline, current, require=["success_rate_delta >= -5"])
print(render(result, "terminal", verbose=True))
print("passed:", result.passed)

Development

git clone https://github.com/khan5v/kalibra.git
cd kalibra
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

khan5v

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.4

Jun 10, 2026

This version

0.2.3

May 20, 2026

0.2.2

Mar 24, 2026

0.2.1

Mar 21, 2026

0.2.0

Mar 21, 2026

0.1.2

Mar 19, 2026

0.1.1

Mar 19, 2026

0.0.1

Mar 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kalibra-0.2.3.tar.gz (2.1 MB view details)

Uploaded May 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

kalibra-0.2.3-py3-none-any.whl (93.0 kB view details)

Uploaded May 20, 2026 Python 3

File details

Details for the file kalibra-0.2.3.tar.gz.

File metadata

Download URL: kalibra-0.2.3.tar.gz
Upload date: May 20, 2026
Size: 2.1 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for kalibra-0.2.3.tar.gz
Algorithm	Hash digest
SHA256	`b58d8e04a368338b7e9089b1a276a0e2af1f9359d57f9e555d570996fc12fa69`
MD5	`48c1d2539388f0de3cbb9899060c86a3`
BLAKE2b-256	`9fea8405c52b5c8526a21d0ed2af9112d029423ff719c66119f61560425aceca`

See more details on using hashes here.

Provenance

The following attestation bundles were made for kalibra-0.2.3.tar.gz:

Publisher: release.yml on khan5v/kalibra

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: kalibra-0.2.3.tar.gz
- Subject digest: b58d8e04a368338b7e9089b1a276a0e2af1f9359d57f9e555d570996fc12fa69
- Sigstore transparency entry: 1585592684
- Sigstore integration time: May 20, 2026
Source repository:
- Permalink: khan5v/kalibra@f3bc515d8d3b072c39d323db67e92d3014566d64
- Branch / Tag: refs/tags/v0.2.3
- Owner: https://github.com/khan5v
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@f3bc515d8d3b072c39d323db67e92d3014566d64
- Trigger Event: push

File details

Details for the file kalibra-0.2.3-py3-none-any.whl.

File metadata

Download URL: kalibra-0.2.3-py3-none-any.whl
Upload date: May 20, 2026
Size: 93.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for kalibra-0.2.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5396a4809516006814515018e9d5d87f3f64e7b1fd1785b97155e43296a93db8`
MD5	`f52c824eb70f0093cafc91ad7624fed6`
BLAKE2b-256	`343b3205b62906f8d3627d786ebb5d8f04a2e60797de3919b104f54ddc530f94`

See more details on using hashes here.

Provenance

The following attestation bundles were made for kalibra-0.2.3-py3-none-any.whl:

Publisher: release.yml on khan5v/kalibra

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: kalibra-0.2.3-py3-none-any.whl
- Subject digest: 5396a4809516006814515018e9d5d87f3f64e7b1fd1785b97155e43296a93db8
- Sigstore transparency entry: 1585592752
- Sigstore integration time: May 20, 2026
Source repository:
- Permalink: khan5v/kalibra@f3bc515d8d3b072c39d323db67e92d3014566d64
- Branch / Tag: refs/tags/v0.2.3
- Owner: https://github.com/khan5v
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@f3bc515d8d3b072c39d323db67e92d3014566d64
- Trigger Event: push

kalibra 0.2.3

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Who this is for

What it doesn't do

What it does

Quality gates for CI

GitHub Actions

Integrations

Development

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance