Pydantic schemas and a writer for the evals-viewer on-disk format — the Python writer side of the evals-viewer framework.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ideonate

These details have not been verified by PyPI

Project description

evals-viewer-io

Pydantic schemas and a writer for the evals-viewer on-disk format. This is the Python writer side of the framework — it produces the JSON tree that @ideonate/evals-viewer-server reads and the Vue frontend @ideonate/evals-viewer-core renders.

Install

pip install evals-viewer-io

Requires Python 3.10+ and Pydantic 2.

What's in the box

Symbol	Purpose
`RunMetadata`, `EvalSummary`, `CaseSummary`, `AggregateStats`	Pydantic models matching the on-disk format
`TokenUsage`	Token / cost model with addition, `from_pydantic_ai` adapter, per-model breakdown
`save_run_metadata`, `save_eval_results`	Filesystem writers — given models and dicts, write JSON in the layout the viewer expects
`compute_aggregates(cases)`	Group `case.scores[evaluator]` across cases → `{evaluator: {mean, min, max}}`
`compute_token_totals(cases)`	Sum token usage / cost / per-model breakdown across cases
`eval_run_dir` (pytest fixture)	Optional fixture creating a fresh run directory under `EVALS_RESULTS_DIR`

Quickstart: minimal end-to-end

from evals_viewer_io import (
    RunMetadata, EvalSummary, CaseSummary, TokenUsage,
    compute_aggregates, compute_token_totals,
    save_eval_results,
)

# 1. Build per-case rows. The output_summary dict is a free-form bag of
#    fields the viewer can show in the eval-detail table; token fields
#    use the canonical input_tokens / output_tokens / cost_usd / usage_by_model.
cases = [
    CaseSummary(
        name="case_001",
        scores={"Accuracy": 0.9, "Coverage": 0.8},
        judge_reasons={"Accuracy": "All key facts present."},
        output_summary={
            "input_tokens": 1234,
            "output_tokens": 567,
            "cost_usd": 0.012,
        },
    ),
    CaseSummary(
        name="case_002",
        scores={"Accuracy": 0.7, "Coverage": 0.9},
        output_summary={"input_tokens": 980, "output_tokens": 440, "cost_usd": 0.009},
    ),
    CaseSummary(name="case_003", success=False, error="Timeout"),
]

# 2. Compute the per-eval aggregates and write the run.
summary = EvalSummary(
    timestamp="2026-04-07T10:30:00Z",
    aggregates=compute_aggregates(cases),
    cases=cases,
)

save_eval_results(
    results_dir="./tests/test-results/evals",
    run_id="2026-04-07_103000",
    eval_name="my_eval",
    summary=summary,
    outputs={
        "case_001": {"answer": "...", "input_tokens": 1234, "output_tokens": 567, "cost_usd": 0.012},
        "case_002": {"answer": "...", "input_tokens": 980, "output_tokens": 440, "cost_usd": 0.009},
    },
    run=RunMetadata(timestamp="2026-04-07T10:30:00Z", git_commit="abc1234"),
)

That writes:

tests/test-results/evals/2026-04-07_103000/
├── run.json
└── my_eval/
    ├── summary.json
    └── outputs/
        ├── case_001.json
        └── case_002.json

Open the viewer and the run shows up.

Token usage

TokenUsage is a normal Pydantic model with __add__ so you can sum across cases or across model calls:

from evals_viewer_io import TokenUsage

opus_call = TokenUsage(input_tokens=1200, output_tokens=300, cost_usd=0.018)
haiku_call = TokenUsage(input_tokens=800, output_tokens=200, cost_usd=0.0009)

# Per-model breakdown for one case
case_total = TokenUsage(
    input_tokens=opus_call.input_tokens + haiku_call.input_tokens,
    output_tokens=opus_call.output_tokens + haiku_call.output_tokens,
    cost_usd=(opus_call.cost_usd or 0) + (haiku_call.cost_usd or 0),
    usage_by_model={"opus": opus_call, "haiku": haiku_call},
)

# Or just use sum() across multiple cases:
total = sum([case1_usage, case2_usage, case3_usage])

The viewer reads input_tokens, output_tokens, cost_usd, and usage_by_model from both each case's full output JSON and from the per-case row in summary.json's output_summary.

Pydantic-AI adapter

If you use pydantic-ai, there's a one-liner to convert its Usage / RunUsage objects (which use request_tokens / response_tokens rather than input / output):

from evals_viewer_io import TokenUsage

usage = TokenUsage.from_pydantic_ai(result.usage(), cost_usd=my_cost_calc(result))

The adapter uses getattr so this package never imports pydantic-ai itself. Other frameworks (OpenAI SDK, Anthropic SDK, …) can be mapped just as easily — TokenUsage(input_tokens=resp.usage.prompt_tokens, output_tokens=resp.usage.completion_tokens) etc.

Cost is the caller's responsibility. Pricing tables go stale fast and don't belong in this package.

Aggregating tokens across cases

from evals_viewer_io import compute_token_totals

totals = compute_token_totals(cases)
print(totals.input_tokens, totals.output_tokens, totals.cost_usd)
print(totals.usage_by_model)  # per-model breakdown summed across all cases

The function reads input_tokens / output_tokens / cost_usd / usage_by_model from each case's output_summary. Cases that don't have those fields contribute zero.

pytest fixture

# tests/conftest.py
from evals_viewer_io.pytest import eval_run_dir  # noqa: F401

# tests/test_my_eval.py
def test_my_eval(eval_run_dir):
    # eval_run_dir is a pathlib.Path under EVALS_RESULTS_DIR (or a tmp dir),
    # and run.json has already been written.
    ...
    save_eval_results(
        results_dir=eval_run_dir.parent,
        run_id=eval_run_dir.name,
        eval_name="my_eval",
        summary=summary,
        outputs=outputs,
    )

Set EVALS_RESULTS_DIR=tests/test-results/evals (or wherever your project keeps them) so the run lands somewhere the viewer can find.

What this package deliberately does not do

This is intentionally a small package — schemas plus the smallest set of helpers that every consumer would need to write themselves. It does not include:

Token field extraction from arbitrary model outputs. Different LLM SDKs name fields differently; the caller knows their own output schema.
A pricing table. Costs are pricing × tokens; pricing changes weekly. You compute it, you pass it in via cost_usd.
Pydantic→dict serialization. If your case output is a Pydantic model, call .model_dump() yourself before passing it to save_eval_results. Hiding that behind a wrapper would just suppress errors.
Coupling to a specific eval framework like pydantic-evals or inspect_ai. The writer takes plain dicts. Frameworks can be added as adapters when there's demand.
Schema versioning. The on-disk format is forward-compatible by design (extra="allow" everywhere). If a breaking change ever lands, that's the time for a schema_version field, not now.

On-disk contract

See docs/data-layout.md in the monorepo for the full directory tree and per-file schemas. The TL;DR:

{results_dir}/{run_id}/
├── run.json                       (RunMetadata)
└── {eval_name}/
    ├── summary.json               (EvalSummary: aggregates + per-case rows)
    ├── outputs/{case_name}.json   (full per-case output)
    ├── inputs/{case_name}.json    (optional; saved input fixture)
    └── case-scores/{case_name}.json (optional; per-question scores)

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ideonate

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.0.4

Apr 8, 2026

This version

0.0.3

Apr 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

evals_viewer_io-0.0.3.tar.gz (7.9 kB view details)

Uploaded Apr 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

evals_viewer_io-0.0.3-py3-none-any.whl (10.1 kB view details)

Uploaded Apr 7, 2026 Python 3

File details

Details for the file evals_viewer_io-0.0.3.tar.gz.

File metadata

Download URL: evals_viewer_io-0.0.3.tar.gz
Upload date: Apr 7, 2026
Size: 7.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for evals_viewer_io-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`50914dba4887598b7f5aa9094a94280d601fcba79c9f42ab43461c50bd12b7eb`
MD5	`7eac2c679bc9e1a52ef24e5a3530aa72`
BLAKE2b-256	`bbdbc73fce30f495769b449f2a490f934708b245ca0e6a67f48da23aef3a3aac`

See more details on using hashes here.

Provenance

The following attestation bundles were made for evals_viewer_io-0.0.3.tar.gz:

Publisher: publish.yml on ideonate/evals-viewer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: evals_viewer_io-0.0.3.tar.gz
- Subject digest: 50914dba4887598b7f5aa9094a94280d601fcba79c9f42ab43461c50bd12b7eb
- Sigstore transparency entry: 1247286649
- Sigstore integration time: Apr 7, 2026
Source repository:
- Permalink: ideonate/evals-viewer@b57cd771780d146fa26cbc4de70bdfc683bed4ce
- Branch / Tag: refs/heads/dev
- Owner: https://github.com/ideonate
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@b57cd771780d146fa26cbc4de70bdfc683bed4ce
- Trigger Event: workflow_dispatch

File details

Details for the file evals_viewer_io-0.0.3-py3-none-any.whl.

File metadata

Download URL: evals_viewer_io-0.0.3-py3-none-any.whl
Upload date: Apr 7, 2026
Size: 10.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for evals_viewer_io-0.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b08ec276dd3e98316a716268d17d172fe4198acb534c6474afe6f8a89603d56a`
MD5	`47a8df5849fa8df92c39cf76803c3f63`
BLAKE2b-256	`9e865fe380da4a586bcac8320863c77d8f3d68922f9da8c76463850da02ca78b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for evals_viewer_io-0.0.3-py3-none-any.whl:

Publisher: publish.yml on ideonate/evals-viewer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: evals_viewer_io-0.0.3-py3-none-any.whl
- Subject digest: b08ec276dd3e98316a716268d17d172fe4198acb534c6474afe6f8a89603d56a
- Sigstore transparency entry: 1247286664
- Sigstore integration time: Apr 7, 2026
Source repository:
- Permalink: ideonate/evals-viewer@b57cd771780d146fa26cbc4de70bdfc683bed4ce
- Branch / Tag: refs/heads/dev
- Owner: https://github.com/ideonate
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@b57cd771780d146fa26cbc4de70bdfc683bed4ce
- Trigger Event: workflow_dispatch

evals-viewer-io 0.0.3

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

evals-viewer-io

Install

What's in the box

Quickstart: minimal end-to-end

Token usage

Pydantic-AI adapter

Aggregating tokens across cases

pytest fixture

What this package deliberately does not do

On-disk contract

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance