Skip to main content

Systematic evaluation of LangGraph nodes using Arize Phoenix experiments.

Project description

evalwire

Systematic, reproducible evaluation of LangGraph nodes and subgraphs against human-curated testsets, tracked in Arize Phoenix.


CI

What it does

When iterating on a LangGraph agent, it is hard to know whether a change to a specific node improved or degraded its behaviour. Running the full graph end-to-end is expensive and makes it difficult to attribute a score change to a specific component.

evalwire solves this by:

  • Turning a human-curated CSV of queries and expected outputs into versioned Arize Phoenix datasets.
  • Letting you define a task that isolates and invokes individual LangGraph nodes independently of the rest of the graph.
  • Running those tasks against the stored datasets, scoring each output with one or more evaluators, and recording results in Phoenix — giving you a reproducible, comparable experiment per run.

Installation

pip install evalwire
# With LangGraph node-isolation helpers:
pip install 'evalwire[langgraph]'

Quick start

1. Upload your testset

evalwire upload --csv data/testset.csv

The CSV must contain a tags column whose values name the target Phoenix dataset (multiple tags can be pipe-delimited: es_search|source_router).

2. Structure your experiments

experiments/
├── es_search/
│   ├── task.py        # defines: async def task(example) -> Any
│   └── top_k.py       # defines: def top_k(output, expected) -> float
└── source_router/
    ├── task.py
    └── accuracy.py

3. Run experiments

evalwire run --experiments experiments/

Node isolation

Use invoke_node to call a single LangGraph node without compiling a full graph:

from evalwire.langgraph import invoke_node

async def task(example) -> list[str]:
    result = await invoke_node(retrieve, example.input["user_query"], RAGState)
    return result["retrieved_titles"]

CLI reference

Command Description
evalwire upload --csv PATH Upload CSV testset to Phoenix
evalwire run --experiments DIR Discover and run all experiments
evalwire run --name NAME Run a single named experiment
evalwire run --dry-run N Run N examples without recording results
evalwire run --concurrency N Run N experiments in parallel

Configuration

Create evalwire.toml in your project root to avoid repeating flags:

[dataset]
csv_path = "data/testset.csv"
on_exist = "skip"

[experiments]
dir = "experiments"
prefix = "eval"
concurrency = 4

Requirements

  • Python >= 3.10
  • arize-phoenix >= 13.0, < 14
  • A running Phoenix instance (local or cloud)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

evalwire-0.2.1.tar.gz (265.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

evalwire-0.2.1-py3-none-any.whl (14.4 kB view details)

Uploaded Python 3

File details

Details for the file evalwire-0.2.1.tar.gz.

File metadata

  • Download URL: evalwire-0.2.1.tar.gz
  • Upload date:
  • Size: 265.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for evalwire-0.2.1.tar.gz
Algorithm Hash digest
SHA256 f5c05e9f64cf7bbdd93b02cd578445a9e119e0dcb48719daf20093ed5adce22e
MD5 fd871d8252a0e1c5b8a1050840356328
BLAKE2b-256 995e57ad3e1b493ce530d919f0c826d8f4a5a5a9e90dc44e22bb1a3497b17b5b

See more details on using hashes here.

File details

Details for the file evalwire-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: evalwire-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 14.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for evalwire-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6a6b51da0e18f5dcf4da8bddc2901f10b2935cb41ebba6a66bd54c6c23f38b4a
MD5 0dd2636f8d01f1619971481edd5d1509
BLAKE2b-256 0d14dc160d4d395c182cb45dacb8993f5647f6a4a5c872a910a974c507fb7ee6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page