Multi-axis bisect engine for finding LLM-agent regressions across prompt, model, tool-schema, and RAG-corpus changes

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

Sornaris

git bisect for LLM-agent regressions. When your agent's success rate drops, sornaris binary-searches which change broke it — a prompt edit, a silent model upgrade, a tool-schema diff, or a RAG-corpus refresh — in log₂(N) eval runs instead of N.

Why this matters

Every eval framework can tell you that your agent regressed. None tell you which of the many things you changed last week caused it. Real agents move on four axes at once — the prompt, the model, the tool schema, and the retrieval corpus — and bisecting them by hand means re-running your eval set over and over. sornaris does the binary search for you and names the culprit version.

Zero runtime dependencies — pure standard library (with an optional sqlite response cache). Bring your own provider, or use the built-in OpenAI / Anthropic adapters (stdlib urllib, API key from the environment).

Install

pip install sornaris

Quickstart (offline, no API key)

import re

from sornaris import (
    BaseProvider,
    EvalExample,
    ExactMatchScorer,
    ModelVersion,
    PromptVersion,
    bisect_single_axis,
    run_eval,
)

# Eight prompt versions in time order; a regression was introduced at v5.
prompts = [
    PromptVersion(version_id=f"v{i}", content=f"build {i}: answer the user")
    for i in range(8)
]
examples = [EvalExample(example_id=f"e{i}", input=f"q{i}", expected="ok") for i in range(5)]


class DemoProvider(BaseProvider):  # stand-in for a real LLM call
    def generate(self, prompt: str, model_id: str) -> str:
        build = int(re.search(r"build (\d+)", prompt).group(1))
        return "ok" if build < 5 else "BROKEN"  # broke at build 5


model, provider, scorer = ModelVersion(model_id="demo"), DemoProvider(), ExactMatchScorer()


def evaluate(pv):
    _, mean = run_eval(pv, model, examples, provider, scorer)
    return mean


report = bisect_single_axis(prompts, evaluate, baseline_idx=0, current_idx=7, threshold=0.5)
print(report.version_id)     # -> "v5"
print(len(report.steps))     # -> ~3 probe rounds, not 8

Runnable versions (single-axis and multi-axis) live in examples/.

CLI

Bisect prompt versions against a real model (the provider reads its key from the environment):

export OPENAI_API_KEY=sk-...
sornaris run \
  --prompts examples/prompt_versions.jsonl \
  --evals examples/eval.jsonl \
  --provider openai \
  --model-id gpt-4o-mini \
  --scorer contains \
  --threshold 0.75 \
  --report bisect_report.json

--provider — fake (offline, for wiring checks), openai, or anthropic.
--scorer — exact or contains.
--cache PATH — sqlite response cache; repeated runs over the same eval set get cheaper.
--models models.jsonl — also bisect the model axis (prompt + model).

prompts / models / evals are JSONL, one object per line:

// versions.jsonl
{"version_id": "v1", "content": "Be concise.", "parent_id": null, "timestamp": 1.0}
// models.jsonl
{"model_id": "gpt-4o-mini", "provider": "openai"}
// eval.jsonl
{"example_id": "e1", "input": "what is 2+2?", "expected": "4"}

How it works

A regression introduced somewhere in an ordered list of N versions is, by definition, monotonic: it's good before the culprit and bad from the culprit on. That's exactly the precondition for binary search, so sornaris localizes it in log₂(N) evaluations. The multi-axis orchestrator pins every other axis at its latest value and walks one axis at a time — so it can say "the model axis is the cause, the prompt axis is innocent." With the sqlite cache, repeated bisects on the same eval set reuse prior scores.

Multi-axis is deliberately a one-axis-at-a-time search (other axes pinned at current), not a full grid — it finds the single axis that, rolled back, recovers the score. That covers the common "what did I change?" case cheaply.

Modules

models — value objects: PromptVersion, ModelVersion, EvalExample, EvalResult, BisectStep, BisectReport, AxisType.
scoring — ExactMatchScorer, ContainsScorer, RegexScorer, CallableScorer.
cache — BisectCache (sqlite-backed, on-disk or in-memory).
providers — BaseProvider, offline FakeProvider / ScriptedProvider, real OpenAIProvider / AnthropicProvider, and build_provider(name, ...).
runner — run_eval(prompt, model, examples, provider, scorer, cache=None).
search — bisect_single_axis(versions, evaluate_fn, baseline_idx, current_idx, threshold).
multi — bisect_multi_axis(axes, evaluate_fn, threshold, priority=None).
cli — the sornaris command-line entry point.

Roadmap

v0.1 — single- and multi-axis bisect, OpenAI/Anthropic adapters, sqlite cache, CLI.
v0.2 — async providers, tool-schema and RAG-corpus axes wired into the CLI, richer scorers (LLM-judge), JSON-schema for reports.
v1.0 — hosted dashboard (track regressions over time), CI action, and optional signed bisect reports.

Verifying a release

Every release is built and signed in CI via PyPI Trusted Publishing — no long-lived tokens, no hand-uploaded files. You can confirm an artifact is exactly what the workflow produced:

# 1. PyPI provenance (PEP 740 attestations) — shown on the project's PyPI page;
#    pip verifies attestations automatically on install (pip >= 24.1).
pip install sornaris

# 2. Sigstore signatures — each wheel/sdist is signed (keyless, OIDC) and the
#    .sigstore.json bundles are attached to the GitHub Release. Verify with:
python -m pip install sigstore
python -m sigstore verify identity \
  --cert-identity "https://github.com/Sergiipis/sornaris/.github/workflows/publish.yml@refs/tags/v0.1.0" \
  --cert-oidc-issuer "https://token.actions.githubusercontent.com" \
  sornaris-0.1.0-py3-none-any.whl

# 3. Checksums — SHA256SUMS is attached to each GitHub Release.
sha256sum -c SHA256SUMS

A CycloneDX SBOM (sbom.cdx.json / .xml) is attached to every release. Builds set SOURCE_DATE_EPOCH from the tag commit, so the wheel is reproducible.

License

MIT — see LICENSE. Free for any use, including commercial.

For paid consulting, custom features or integrations, contact @Sergiipis on GitHub.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

sergiipis

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

May 30, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sornaris-0.1.0.tar.gz (23.3 kB view details)

Uploaded May 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sornaris-0.1.0-py3-none-any.whl (16.0 kB view details)

Uploaded May 30, 2026 Python 3

File details

Details for the file sornaris-0.1.0.tar.gz.

File metadata

Download URL: sornaris-0.1.0.tar.gz
Upload date: May 30, 2026
Size: 23.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sornaris-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`b62d5772e46e20d37d6997bb5278129076fcef2b695f614ddec58cceb9a27594`
MD5	`d962fc4cbeff80a974f3b183a020633d`
BLAKE2b-256	`8e38dea06eacfab4a1f70245a6858c17f97b5091e84357f7f0435f177ed3ecc4`

See more details on using hashes here.

Provenance

The following attestation bundles were made for sornaris-0.1.0.tar.gz:

Publisher: publish.yml on Sergiipis/sornaris

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: sornaris-0.1.0.tar.gz
- Subject digest: b62d5772e46e20d37d6997bb5278129076fcef2b695f614ddec58cceb9a27594
- Sigstore transparency entry: 1676740820
- Sigstore integration time: May 30, 2026
Source repository:
- Permalink: Sergiipis/sornaris@af8918609e1e02c1081651bcc506644cc22f3a55
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/Sergiipis
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@af8918609e1e02c1081651bcc506644cc22f3a55
- Trigger Event: push

File details

Details for the file sornaris-0.1.0-py3-none-any.whl.

File metadata

Download URL: sornaris-0.1.0-py3-none-any.whl
Upload date: May 30, 2026
Size: 16.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sornaris-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3c88ab926ad25a8653e99ebac6b2acf949818ea32b7ec2ae1429cb0c8c5d5348`
MD5	`bf97d0336c2ccea6d917eb5b48053ad0`
BLAKE2b-256	`14720c1fbe062d000a8d0bc34b6e01ea736e45c4cb47c06335756385a8efc0e0`

See more details on using hashes here.

Provenance

The following attestation bundles were made for sornaris-0.1.0-py3-none-any.whl:

Publisher: publish.yml on Sergiipis/sornaris

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: sornaris-0.1.0-py3-none-any.whl
- Subject digest: 3c88ab926ad25a8653e99ebac6b2acf949818ea32b7ec2ae1429cb0c8c5d5348
- Sigstore transparency entry: 1676740825
- Sigstore integration time: May 30, 2026
Source repository:
- Permalink: Sergiipis/sornaris@af8918609e1e02c1081651bcc506644cc22f3a55
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/Sergiipis
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@af8918609e1e02c1081651bcc506644cc22f3a55
- Trigger Event: push

sornaris 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Sornaris

Why this matters

Install

Quickstart (offline, no API key)

CLI

How it works

Modules

Roadmap

Verifying a release

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance