Systematic evaluation of LangGraph nodes using Arize Phoenix experiments.
Project description
evalwire
Systematic, reproducible evaluation of LangGraph nodes and subgraphs against human-curated testsets, tracked in Arize Phoenix.
What it does
When iterating on a LangGraph agent, it is hard to know whether a change to a specific node improved or degraded its behaviour. Running the full graph end-to-end is expensive and makes it difficult to attribute a score change to a specific component.
evalwire solves this by:
- Turning a human-curated CSV of queries and expected outputs into versioned Arize Phoenix datasets.
- Letting you define a task that isolates and invokes individual LangGraph nodes independently of the rest of the graph.
- Running those tasks against the stored datasets, scoring each output with one or more evaluators, and recording results in Phoenix — giving you a reproducible, comparable experiment per run.
Installation
pip install evalwire
# With LangGraph node-isolation helpers:
pip install 'evalwire[langgraph]'
Quick start
1. Upload your testset
evalwire upload --csv data/testset.csv
The CSV must contain a tags column whose values name the target Phoenix dataset (multiple tags can be pipe-delimited: es_search|source_router).
2. Structure your experiments
experiments/
├── es_search/
│ ├── task.py # defines: async def task(example) -> Any
│ └── top_k.py # defines: def top_k(output, expected) -> float
└── source_router/
├── task.py
└── accuracy.py
3. Run experiments
evalwire run --experiments experiments/
Node isolation
Use invoke_node to call a single LangGraph node without compiling a full graph:
from evalwire.langgraph import invoke_node
async def task(example) -> list[str]:
result = await invoke_node(retrieve, example.input["user_query"], RAGState)
return result["retrieved_titles"]
CLI reference
| Command | Description |
|---|---|
evalwire upload --csv PATH |
Upload CSV testset to Phoenix |
evalwire run --experiments DIR |
Discover and run all experiments |
evalwire run --name NAME |
Run a single named experiment |
evalwire run --dry-run N |
Run N examples without recording results |
evalwire run --concurrency N |
Run N experiments in parallel |
Configuration
Create evalwire.toml in your project root to avoid repeating flags:
[dataset]
csv_path = "data/testset.csv"
on_exist = "skip"
[experiments]
dir = "experiments"
prefix = "eval"
concurrency = 4
Requirements
- Python >= 3.10
arize-phoenix >= 13.0, < 14- A running Phoenix instance (local or cloud)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file evalwire-0.2.2.tar.gz.
File metadata
- Download URL: evalwire-0.2.2.tar.gz
- Upload date:
- Size: 265.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0c7114e9b80a16c8f18323f98e740cbf0c8bb45b99b528e12df38e5df5d0eca6
|
|
| MD5 |
ff1a689a3e9cb7e406f2158bc3079d9c
|
|
| BLAKE2b-256 |
3b27b7419e6765428dc4bc627c265048011d6a781a8dfdb0b9914a693f17a143
|
File details
Details for the file evalwire-0.2.2-py3-none-any.whl.
File metadata
- Download URL: evalwire-0.2.2-py3-none-any.whl
- Upload date:
- Size: 14.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b998491e9790b861733f9bb6bfd609fc047c7b15f4422b55ba4738a20f379c19
|
|
| MD5 |
c278e2ce2654b91ac360812c66a5df67
|
|
| BLAKE2b-256 |
95800bd7944f8dbf8564a576d8b10197fa0d618c865e262a5bd3178fd6c36d70
|