Systematic evaluation of LangGraph nodes using Arize Phoenix experiments.

Project description

evalwire

Systematic, reproducible evaluation of LangGraph nodes and subgraphs against human-curated testsets, tracked in Arize Phoenix.

What it does

When iterating on a LangGraph agent, it is hard to know whether a change to a specific node improved or degraded its behaviour. Running the full graph end-to-end is expensive and makes it difficult to attribute a score change to a specific component.

evalwire solves this by:

Turning a human-curated CSV of queries and expected outputs into versioned Arize Phoenix datasets.
Letting you define a task that isolates and invokes individual LangGraph nodes independently of the rest of the graph.
Running those tasks against the stored datasets, scoring each output with one or more evaluators, and recording results in Phoenix — giving you a reproducible, comparable experiment per run.

Installation

pip install evalwire
# With LangGraph node-isolation helpers:
pip install 'evalwire[langgraph]'

Quick start

1. Upload your testset

evalwire upload --csv data/testset.csv

The CSV must contain a tags column whose values name the target Phoenix dataset (multiple tags can be pipe-delimited: es_search|source_router).

2. Structure your experiments

experiments/
├── es_search/
│   ├── task.py        # defines: async def task(example) -> Any
│   └── top_k.py       # defines: def top_k(output, expected) -> float
└── source_router/
    ├── task.py
    └── accuracy.py

3. Run experiments

evalwire run --experiments experiments/

Node isolation

Use invoke_node to call a single LangGraph node without compiling a full graph:

from evalwire.langgraph import invoke_node

async def task(example) -> list[str]:
    result = await invoke_node(retrieve, example.input["user_query"], RAGState)
    return result["retrieved_titles"]

CLI reference

Command	Description
`evalwire upload --csv PATH`	Upload CSV testset to Phoenix
`evalwire run --experiments DIR`	Discover and run all experiments
`evalwire run --name NAME`	Run a single named experiment
`evalwire run --dry-run N`	Run N examples without recording results
`evalwire run --concurrency N`	Run N experiments in parallel

Configuration

Create evalwire.toml in your project root to avoid repeating flags:

[dataset]
csv_path = "data/testset.csv"
on_exist = "skip"

[experiments]
dir = "experiments"
prefix = "eval"
concurrency = 4

Requirements

Python >= 3.10
arize-phoenix >= 13.0, < 14
A running Phoenix instance (local or cloud)

Project details

Release history Release notifications | RSS feed

0.4.1

May 4, 2026

0.4.0

Apr 24, 2026

0.3.1

Mar 31, 2026

0.3.0

Mar 31, 2026

This version

0.2.2

Mar 30, 2026

0.2.1

Mar 30, 2026

0.2.0

Mar 30, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

evalwire-0.2.2.tar.gz (265.8 kB view details)

Uploaded Mar 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

evalwire-0.2.2-py3-none-any.whl (14.9 kB view details)

Uploaded Mar 30, 2026 Python 3

File details

Details for the file evalwire-0.2.2.tar.gz.

File metadata

Download URL: evalwire-0.2.2.tar.gz
Upload date: Mar 30, 2026
Size: 265.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for evalwire-0.2.2.tar.gz
Algorithm	Hash digest
SHA256	`0c7114e9b80a16c8f18323f98e740cbf0c8bb45b99b528e12df38e5df5d0eca6`
MD5	`ff1a689a3e9cb7e406f2158bc3079d9c`
BLAKE2b-256	`3b27b7419e6765428dc4bc627c265048011d6a781a8dfdb0b9914a693f17a143`

See more details on using hashes here.

File details

Details for the file evalwire-0.2.2-py3-none-any.whl.

File metadata

Download URL: evalwire-0.2.2-py3-none-any.whl
Upload date: Mar 30, 2026
Size: 14.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for evalwire-0.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b998491e9790b861733f9bb6bfd609fc047c7b15f4422b55ba4738a20f379c19`
MD5	`c278e2ce2654b91ac360812c66a5df67`
BLAKE2b-256	`95800bd7944f8dbf8564a576d8b10197fa0d618c865e262a5bd3178fd6c36d70`

See more details on using hashes here.

evalwire 0.2.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

evalwire

What it does

Installation

Quick start

1. Upload your testset

2. Structure your experiments

3. Run experiments

Node isolation

CLI reference

Configuration

Requirements

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes