Python RPC client for @tangle-network/agent-eval — judge content against rubrics over HTTP or stdio RPC. Eval logic runs in the Node runtime; this package is a thin wire client.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

drewstone

These details have not been verified by PyPI

Project description

agent-eval-rpc — Python client

Python client for @tangle-network/agent-eval — a content/code judging framework written in TypeScript. This package is a thin transport adapter: every judgement runs in the Node runtime, marshalled over HTTP or stdio RPC. Two languages, one implementation. No drift.

What you get

A function-call interface to score any string against a rubric:

from agent_eval_rpc import Client

client = Client()  # auto-detects HTTP server, falls back to subprocess
result = client.judge(
    content="We just launched zero-copy IO between agents and their workdir",
    rubric_name="anti-slop",
)

print(result.composite)         # 0.0..1.0 — single number to gate on
print(result.dimensions)        # {"buyer_quality": 0.7, "voice": 0.8, "signal": 0.9}
print(result.failure_modes)     # [] or ["ai-cadence", "marketing-tone", ...]
print(result.wins)              # ["specific-component", "earned-detail", ...]
print(result.rationale)         # "The post names a real architectural detail..."

That's the entire surface for content judging. A self-contained runnable example with pytest invariants lives at examples/judge_anti_slop.py.

Hosted-tier ingest

Ship eval-run events + OTel-shape trace spans to any orchestrator that speaks the wire format frozen at HOSTED_WIRE_VERSION = '2026-05-26.v1'. Same contract as the TypeScript client at src/hosted/client.ts; same reference receiver at examples/hosted-ingest-server/.

from agent_eval_rpc import (
    HostedClient, EvalRunEvent, EvalRunGenerationSnapshot,
    EvalRunCellScore, make_trace_span,
)

with HostedClient(
    endpoint="http://localhost:8080",
    api_key="dev-token",
    tenant_id="acme",
) as client:
    res = client.ingest_eval_run(EvalRunEvent(
        runId="run-py-1",
        runDir="/runs/run-py-1",
        timestamp="2026-05-27T00:00:00Z",
        status="finished",
        labels={"env": "test"},
        baseline=EvalRunGenerationSnapshot(
            index=0, surfaceHash="h-base",
            cells=[EvalRunCellScore(scenarioId="s1", rep=0, compositeMean=0.5,
                                     dimensions={"llm": {"accuracy": 0.5}})],
            compositeMean=0.5, costUsd=0.1, durationMs=1000,
        ),
        generations=[],
        gateDecision="hold",
        totalCostUsd=0.1, totalDurationMs=1000,
    ))
    assert res.accepted == 1

    spans = [make_trace_span(
        trace_id="t", span_id="s-0", name="dispatch",
        start_time_unix_nano=1_700_000_000_000_000_000,
        end_time_unix_nano=1_700_000_001_000_000_000,
        attributes={"step": 0},
        tangle_run_id="run-py-1", tangle_scenario_id="s1", tangle_generation=0,
    )]
    client.ingest_traces(spans)

Bearer auth, per-tenant idempotency, capped exponential backoff on 5xx/408/429. The tangle.* pivot keys are first-class on make_trace_span so the dashboard can stitch traces back to their owning run.

End-to-end tests at tests/test_hosted.py spawn the TypeScript reference receiver and prove binary compat across languages; if either side drifts, both test suites fail.

Install

cd clients/python
pip install -e .

To use it, one of:

npm install -g @tangle-network/agent-eval — gives you the agent-eval binary, used by the subprocess transport (works offline, slower per call due to Node startup ~500ms).
Run a server: agent-eval serve --port 5005 — gives you HTTP transport (~10ms per call once up).

The Python client picks whichever is available. Force one with Client(transport="http") or Client(transport="subprocess").

Why the architecture works this way

The TypeScript package is the source of truth for evaluation logic. We don't reimplement rubrics, scoring, or judges in Python — we marshal JSON to the canonical runtime over a versioned wire protocol (defined as Zod schemas, exported as OpenAPI, mirrored in this package as pydantic models).

Adding a new method to the API means: define a Zod schema in src/wire/schemas.ts, write the handler in src/wire/handlers.ts, and the Python client picks it up on the next regeneration. There is no separate Python implementation to maintain.

This is the same pattern as the Anthropic SDK, Stripe SDK, and gRPC: one canonical implementation, language-specific transport clients.

API

`Client`

Client(
    base_url: str | None = None,        # AGENT_EVAL_URL or http://127.0.0.1:5005
    cli_path: str | None = None,        # AGENT_EVAL_CLI or 'agent-eval'
    transport: "auto" | "http" | "subprocess" = "auto",
    timeout_s: float = 120.0,
)

`client.judge(...)`

Score a piece of content against a rubric.

def judge(
    *,
    content: str,                                  # the text being judged
    rubric_name: str | None = None,                # OR
    rubric: Rubric | dict | None = None,           # an inline rubric definition
    context: dict | None = None,                   # free-form metadata for the judge
    model: str | None = None,                      # override the judge LLM
) -> JudgeResult

Either rubric_name (use a built-in like "anti-slop") or rubric (an inline definition with your own dimensions/prompt). Not both.

Returns JudgeResult:

composite: float — weighted score in 0..1. The single number to gate on.
dimensions: dict[str, float] — per-axis scores (e.g. {"buyer_quality": 0.7}).
failure_modes: list[str] — ids of negative patterns detected.
wins: list[str] — ids of positive patterns detected.
rationale: str — plain-English explanation.
rubric_version: str — stable hash of the rubric used. Compare scores only when this matches.
model: str — LLM that produced the judgement.
duration_ms: int — wall-clock latency.

`client.list_rubrics()`

Return every rubric the server has registered, with their dimensions and stable rubric_version.

rubrics = client.list_rubrics()
for r in rubrics.rubrics:
    print(r.name, r.description, r.rubric_version)

`client.version()`

Return server + wire-protocol version. Match your pip install version to version; check wire_version for compatibility.

v = client.version()
assert v.version.startswith("0.20")
assert v.wire_version == "1.0.0"

Defining a custom rubric

Built-in anti-slop is tuned for technical-buyer audiences. For different scoring, pass a Rubric inline:

from agent_eval_rpc import Client, Rubric, RubricDimension, FailureMode

rubric = Rubric(
    name="my-rubric",
    description="Does this commit message explain WHY, not just what?",
    systemPrompt="You score commit messages. Score 0..1 on whether the WHY is clear...",
    dimensions=[
        RubricDimension(id="explains_why", description="Does the message say *why*?", weight=1.0),
    ],
    failureModes=[
        FailureMode(id="what-not-why", description="States the change but not the reason"),
    ],
)

result = client.judge(content="bumped the version", rubric=rubric)

Errors

Exception	When
`ValidationError`	Server (or pydantic) rejected the request as malformed. Fix your inputs.
`RubricNotFoundError`	Unknown `rubric_name`. Call `list_rubrics()` to see what's registered.
`TransportError`	HTTP unreachable or subprocess failed. Retry or check the server.
`AgentEvalError`	Base class — catches everything above.

All errors carry .code and .details (the structured payload from the server).

Versioning

This package is version-locked to the npm package. agent-eval-rpc==0.21.0 ↔ @tangle-network/agent-eval@0.21.0. CI verifies the npm package, Python package, runtime __version__, and release tag all agree before publish. If one registry publish fails after the other succeeds, retry the failed publish from the same tag or supersede with the next patch release.

wire_version is separate. It bumps only on breaking schema changes. Package versions can differ across releases as long as wire_version is the same.

Development

# install in editable mode
pip install -e ".[dev]"

# unit tests (no Node required)
pytest tests/test_models.py

# integration tests against the bundled CLI
cd ../.. && pnpm build         # build the agent-eval CLI in repo root
cd clients/python && pytest    # runs subprocess tests against dist/cli.js

Adding a new method

When the TS side adds a new endpoint (say evaluateScenario):

Update src/wire/schemas.ts with EvaluateScenarioRequestSchema and EvaluateScenarioResponseSchema.
Add a handler in src/wire/handlers.ts, route in src/wire/server.ts, and case in src/wire/rpc.ts.
In this client, add the matching pydantic model in models.py and method on Client. The pattern is mechanical — copy the shape from judge.
Test in both languages. Bump versions together.

A future iteration moves step 3 to datamodel-code-generator -i openapi.json so it's mechanical-and-automatic instead of mechanical-by-hand. Until the surface grows past ~10 endpoints, hand-written models are more readable.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

drewstone

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.103.0

Jul 2, 2026

0.102.1

Jul 2, 2026

0.102.0

Jul 2, 2026

0.101.2

Jul 1, 2026

0.101.1

Jul 1, 2026

0.101.0

Jul 1, 2026

0.100.2

Jul 1, 2026

0.100.1

Jun 30, 2026

0.100.0

Jun 24, 2026

0.99.0

Jun 23, 2026

0.98.0

Jun 22, 2026

0.97.0

Jun 22, 2026

0.96.5

Jun 22, 2026

0.96.2

Jun 22, 2026

0.96.1

Jun 22, 2026

0.96.0

Jun 22, 2026

0.95.1

Jun 21, 2026

0.95.0

Jun 21, 2026

0.94.0

Jun 19, 2026

0.93.0

Jun 17, 2026

0.92.0

Jun 14, 2026

0.91.0

Jun 11, 2026

0.90.1

Jun 10, 2026

0.90.0

Jun 10, 2026

0.89.0

Jun 10, 2026

0.86.0

Jun 10, 2026

0.84.0

Jun 7, 2026

0.83.0

Jun 5, 2026

0.82.0

Jun 5, 2026

0.81.0

Jun 5, 2026

0.80.0

Jun 5, 2026

0.77.0

Jun 2, 2026

0.76.0

Jun 2, 2026

This version

0.75.0

Jun 2, 2026

0.74.0

Jun 2, 2026

0.73.0

Jun 2, 2026

0.72.4

Jun 2, 2026

0.72.3

Jun 2, 2026

0.72.2

Jun 1, 2026

0.72.1

Jun 1, 2026

0.59.1

May 29, 2026

0.59.0

May 29, 2026

0.58.2

May 29, 2026

0.58.1

May 29, 2026

0.58.0

May 28, 2026

0.57.0

May 28, 2026

0.56.0

May 28, 2026

0.55.0

May 28, 2026

0.54.0

May 28, 2026

0.53.0

May 27, 2026

0.52.0

May 27, 2026

0.51.0

May 27, 2026

0.50.2

May 27, 2026

0.50.1

May 27, 2026

0.50.0

May 27, 2026

0.49.0

May 27, 2026

0.48.0

May 27, 2026

0.42.0

May 25, 2026

0.41.0

May 25, 2026

0.40.5

May 25, 2026

0.40.4

May 25, 2026

0.40.3

May 25, 2026

0.40.2

May 25, 2026

0.40.1

May 25, 2026

0.34.1

May 23, 2026

0.34.0

May 22, 2026

0.32.0

May 21, 2026

0.31.1

May 20, 2026

0.31.0

May 20, 2026

0.29.1

May 19, 2026

0.29.0

May 19, 2026

0.28.0

May 19, 2026

0.27.2

May 18, 2026

0.27.0

May 17, 2026

0.25.0

May 14, 2026

0.24.0

May 14, 2026

0.23.0

May 8, 2026

0.22.0

May 8, 2026

0.21.0

May 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_eval_rpc-0.75.0.tar.gz (40.3 kB view details)

Uploaded Jun 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agent_eval_rpc-0.75.0-py3-none-any.whl (15.9 kB view details)

Uploaded Jun 2, 2026 Python 3

File details

Details for the file agent_eval_rpc-0.75.0.tar.gz.

File metadata

Download URL: agent_eval_rpc-0.75.0.tar.gz
Upload date: Jun 2, 2026
Size: 40.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agent_eval_rpc-0.75.0.tar.gz
Algorithm	Hash digest
SHA256	`92e6801303fe208b7b152fb7544f4c956112dfc04a71565ba0536b35584d7c86`
MD5	`1d0743a15421c1e8185328dc807eb5b5`
BLAKE2b-256	`c10a127f2588acbbb1833ba69b852398caca1a18903800a26f1dbf017a711468`

See more details on using hashes here.

Provenance

The following attestation bundles were made for agent_eval_rpc-0.75.0.tar.gz:

Publisher: publish.yml on tangle-network/agent-eval

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: agent_eval_rpc-0.75.0.tar.gz
- Subject digest: 92e6801303fe208b7b152fb7544f4c956112dfc04a71565ba0536b35584d7c86
- Sigstore transparency entry: 1704829634
- Sigstore integration time: Jun 2, 2026
Source repository:
- Permalink: tangle-network/agent-eval@53a5bd65265345183476af6b5ed534be450a6e00
- Branch / Tag: refs/tags/v0.75.0
- Owner: https://github.com/tangle-network
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@53a5bd65265345183476af6b5ed534be450a6e00
- Trigger Event: push

File details

Details for the file agent_eval_rpc-0.75.0-py3-none-any.whl.

File metadata

Download URL: agent_eval_rpc-0.75.0-py3-none-any.whl
Upload date: Jun 2, 2026
Size: 15.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agent_eval_rpc-0.75.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`19ae405f07e5aa7d21e7a85434b0b2b5228e3030ba61a977a1ee461ff61aeccd`
MD5	`84359dc7da98e5186f5f82ed328817f4`
BLAKE2b-256	`5acb4b03e371fd988bf5ccb6764e16cf56b391d88def1fda2a13f787145ff828`

See more details on using hashes here.

Provenance

The following attestation bundles were made for agent_eval_rpc-0.75.0-py3-none-any.whl:

Publisher: publish.yml on tangle-network/agent-eval

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: agent_eval_rpc-0.75.0-py3-none-any.whl
- Subject digest: 19ae405f07e5aa7d21e7a85434b0b2b5228e3030ba61a977a1ee461ff61aeccd
- Sigstore transparency entry: 1704829657
- Sigstore integration time: Jun 2, 2026
Source repository:
- Permalink: tangle-network/agent-eval@53a5bd65265345183476af6b5ed534be450a6e00
- Branch / Tag: refs/tags/v0.75.0
- Owner: https://github.com/tangle-network
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@53a5bd65265345183476af6b5ed534be450a6e00
- Trigger Event: push

agent-eval-rpc 0.75.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

agent-eval-rpc — Python client

What you get

Hosted-tier ingest

Install

Why the architecture works this way

API

Client

client.judge(...)

client.list_rubrics()

client.version()

Defining a custom rubric

Errors

Versioning

Development

Adding a new method

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`Client`

`client.judge(...)`

`client.list_rubrics()`

`client.version()`