Skip to main content

Python SDK for authoring Mira eval studies (protocol over stdio, no Rust dependency).

Project description

mira-eval — Python SDK

Author a Mira eval study in Python and run it with the mira host CLI.

This is not a binding to the Rust core — it's a native, pure-stdlib library that speaks the Mira eval protocol (newline-delimited JSON over stdio). The host owns selection, the model matrix, concurrency, saved runs, and reporting; the study owns subjects and scoring. Any language that speaks the protocol is a first-class study — this SDK just makes the Python side ergonomic.

The protocol layer is generated from the canonical artifacts under schema/v1/ — the same language-neutral contract the Rust host is generated from — so it never drifts from the wire format: mira/_wire.py (wire types, from schema.json) and mira/_meta.py (protocol version, methods, capability tokens, from meta.json).

Use

import mira

study = mira.Study("my-evals", version="0.1.0")

@study.eval(
    samples=[mira.Sample("hi", prompt="Say hi and the answer to life.", tags=["smoke"])],
    targets=[mira.target("sim")],
    scorers=[mira.succeeded(), mira.contains("42")],
)
def greet(sample, cx):
    # A real subject calls a model; route on cx.target / cx.provider.
    return mira.transcript(
        f"Hi! The answer is 42. ({sample.text})",
        usage=mira.Usage(input_tokens=40, output_tokens=8),
    )

if __name__ == "__main__":
    study.serve()

Drive it with the host:

mira --python3 study.py list
mira --python3 study.py run
# run-now, score-later (split execute/score path):
mira --python3 study.py run --execute-only --artifacts art/
mira --python3 study.py score --artifacts art/

--python3 is a convenience launcher; --python and --uv (uv run …) are the equivalents for those interpreters, and --cmd "python3 study.py" still works for an arbitrary command line.

A complete, runnable example lives in examples/greet-python.

API

  • Study(name, version=None, page_size=500) — the registry; @study.eval(...) registers a subject fn(sample, cx) -> Transcript; study.serve() runs the stdio loop (handling initialize/list/list_samples/run/execute/score). page_size paginates large datasets across list + list_samples (0 disables).
  • Sample(id, prompt=…|input=[…], tags=…, expected=…, files=…, metadata=…)sample.text joins the input turns for the subject.
  • target(label, provider="", available=True) — a matrix case (the model or harness under evaluation). An unavailable target is reported as N/A (infra), not a failure.
  • RunCxcx.target, cx.provider, cx.max_turns, cx.param(name).
  • transcript(final_response, usage=…, timing=…, iterations=…, …) and the Usage/Timing types.
  • Scorers: succeeded(), contains(text), equals(text), regex(pattern), and scorer(name, fn) for an arbitrary predicate (return a bool or a fully-formed Score, including na=True).
  • axis(name, values) — an extra matrix axis (crossed with the model matrix).

Develop

python3 codegen.py            # regenerate mira/_wire.py + mira/_meta.py from schema/v1/
python3 codegen.py --check    # fail if either is stale (CI drift guard)
pip install -e .[dev]
python3 -m pytest             # conformance + metadata-coverage + serve-loop tests

The runtime has zero dependencies; jsonschema and pytest are dev-only.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mira_eval-0.2.0.tar.gz (16.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mira_eval-0.2.0-py3-none-any.whl (12.7 kB view details)

Uploaded Python 3

File details

Details for the file mira_eval-0.2.0.tar.gz.

File metadata

  • Download URL: mira_eval-0.2.0.tar.gz
  • Upload date:
  • Size: 16.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for mira_eval-0.2.0.tar.gz
Algorithm Hash digest
SHA256 b4cb6eb547f6fbfa20e952b5d665517c0c44567612841a1afa7faeb2102fdacf
MD5 834c08fa3e18c4762dc1f16ae44848ad
BLAKE2b-256 647fcee7d36c40b67ea6e64fa0db52af4cb63974aca167aea7b0cadd23547aa3

See more details on using hashes here.

Provenance

The following attestation bundles were made for mira_eval-0.2.0.tar.gz:

Publisher: publish.yml on everruns/mira

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mira_eval-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: mira_eval-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 12.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for mira_eval-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 af62f0e49087a4b3aab24cdd936fb54dba9ed7d4917129625cc7d18d7ccad4f7
MD5 8cb33f3c8fc9b07f778a697c16bfe7e0
BLAKE2b-256 4e3f498c317c47accb8a2d009c8b061f093c4681ed2f02b9748bfcd7f3e55caf

See more details on using hashes here.

Provenance

The following attestation bundles were made for mira_eval-0.2.0-py3-none-any.whl:

Publisher: publish.yml on everruns/mira

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page