ZeroEval SDK
Project description
ZeroEval SDK
The Python SDK and CLI for ZeroEval -- monitoring, prompt management, judges, and optimization for AI products.
pip install zeroeval
Quick Start
1. Setup
zeroeval setup
This opens the ZeroEval dashboard, prompts for your project API key, and saves it along with your project context. Every command after this just works.
2. Trace your AI calls
import zeroeval as ze
import openai
ze.init()
client = openai.OpenAI()
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "What is ZeroEval?"}],
)
OpenAI, Gemini, LangChain, and LangGraph calls are automatically traced. No extra code needed.
3. Manage prompts
import zeroeval as ze
ze.init()
prompt = ze.prompt(
name="support-triage",
content="Classify this support ticket: {{ticket_text}}",
variables={"ticket_text": "I can't log in to my account"},
)
Prompts are versioned, tagged, and linked to traces automatically.
4. Inspect from the CLI
zeroeval traces list --start-date 2025-01-01
zeroeval judges list
zeroeval prompts list
Installation
pip install zeroeval # Core SDK
pip install zeroeval[openai] # OpenAI auto-instrumentation
pip install zeroeval[gemini] # Google Gemini
pip install zeroeval[langchain] # LangChain
pip install zeroeval[langgraph] # LangGraph
pip install zeroeval[all] # Everything
Integrations are detected and instrumented automatically.
Authentication
Interactive (recommended):
zeroeval setup
Saves your API key and resolves your project automatically. You're ready to go.
Non-interactive (CI, agents):
zeroeval auth set --api-key sk_ze_...
In code:
import zeroeval as ze
ze.init(api_key="sk_ze_...")
Resolution order: explicit flag > environment variable (ZEROEVAL_API_KEY, ZEROEVAL_PROJECT_ID) > CLI config (~/.config/zeroeval/config.json).
Tracing
The SDK traces AI calls with a Session > Trace > Span hierarchy.
import zeroeval as ze
ze.init()
# Decorator-based tracing
@ze.span(name="process-ticket")
def process_ticket(ticket_text: str):
# your logic here
return result
# Context manager
with ze.span(name="embedding-step"):
embeddings = get_embeddings(text)
# Manual spans
span = ze.start_span(name="retrieval")
results = search(query)
span.end()
# Tags for filtering in the dashboard
ze.set_tag("trace", {"environment": "production", "model": "gpt-4o"})
See INTEGRATIONS.md for automatic OpenAI, Gemini, LangChain, and LangGraph tracing.
Prompts
Prompts are versioned and tagged. Use ze.prompt() to fetch and render them:
prompt = ze.prompt(
name="support-triage",
content="Classify: {{ticket_text}}",
variables={"ticket_text": ticket},
)
# Fetch a specific version or tag
prompt = ze.get_prompt("support-triage", version=3)
prompt = ze.get_prompt("support-triage", tag="production")
Prompts are automatically linked to traces and available for optimization.
Feedback
Submit feedback on prompt completions to build training data for optimization:
# Thumbs up/down
ze.send_feedback(
prompt_slug="support-triage",
completion_id="completion-uuid",
thumbs_up=True,
reason="Correct classification",
)
# Scored judge feedback
ze.send_feedback(
prompt_slug="quality-scorer",
completion_id="completion-uuid",
thumbs_up=False,
judge_id="judge-uuid",
expected_score=3.5,
score_direction="too_high",
criteria_feedback={
"accuracy": {"expected_score": 4.0, "reason": "Mostly correct"},
"tone": {"expected_score": 1.0, "reason": "Too formal"},
},
)
Feedback with reason and expected_output creates stronger training examples for prompt optimization.
Datasets & Evals
import zeroeval as ze
ze.init()
# Pull a dataset
dataset = ze.Dataset.pull("my-dataset")
for row in dataset:
print(row.question, row.answer)
# Run an evaluation
@ze.task(outputs=["prediction"])
def solve(row):
return {"prediction": llm_call(row["question"])}
@ze.evaluation(mode="row", outputs=["correct"])
def check(answer, prediction):
return {"correct": int(answer == prediction)}
run = dataset.eval(solve, workers=8)
run = run.score([check], column_map={"answer": "answer", "prediction": "prediction"})
CLI
The zeroeval CLI covers monitoring, judges, prompts, optimization, datasets, and evals. Designed for both humans and automation (--output json).
Setup
zeroeval setup # Interactive setup
zeroeval auth set --api-key sk_ze_... # Non-interactive
zeroeval auth show --redact # Show config
Global flags
| Flag | Default | Description |
|---|---|---|
--output text|json |
text |
Output format. json emits stable JSON to stdout. |
--project-id |
auto from setup | Override project context. |
--api-base-url |
https://api.zeroeval.com |
Override API URL. |
--timeout |
60.0 |
Request timeout in seconds. |
--quiet |
off | Suppress non-essential logs. |
Monitoring
zeroeval sessions list --start-date 2025-01-01
zeroeval sessions get <session_id>
zeroeval traces list --start-date 2025-01-01
zeroeval traces get <trace_id>
zeroeval traces spans <trace_id>
zeroeval spans list --start-date 2025-01-01
zeroeval spans get <span_id>
Judges
zeroeval judges list
zeroeval judges get <judge_id>
zeroeval judges evaluations <judge_id> --limit 100
zeroeval judges criteria <judge_id>
zeroeval judges insights <judge_id>
zeroeval judges performance <judge_id>
zeroeval judges calibration <judge_id>
zeroeval judges versions <judge_id>
# Create a judge
zeroeval judges create \
--name "answer-quality" \
--prompt-file judge.txt \
--evaluation-type binary \
--sample-rate 1.0
# Submit feedback on a judge evaluation
zeroeval judges feedback create \
--span-id <span_id> \
--thumbs-up \
--reason "Correct evaluation"
Prompts
zeroeval prompts list
zeroeval prompts get <slug> --version 3
zeroeval prompts versions <slug>
zeroeval prompts tags <slug>
# Submit feedback on a completion
zeroeval prompts feedback create \
--prompt-slug support-triage \
--completion-id <id> \
--thumbs-down \
--reason "Wrong classification"
Optimization
# Prompt optimization
zeroeval optimize prompt list <task_id>
zeroeval optimize prompt start <task_id> --optimizer-type quick_refine
zeroeval optimize prompt promote <task_id> <run_id> --yes
# Judge optimization
zeroeval optimize judge list <judge_id>
zeroeval optimize judge start <judge_id>
zeroeval optimize judge promote <judge_id> <run_id> --yes
Datasets & Evals
zeroeval datasets list
zeroeval datasets get <name>
zeroeval datasets versions <name>
zeroeval datasets rows <name> --version 3 --limit 200
zeroeval evals list --status completed
zeroeval evals get <eval_id>
zeroeval evals summary <eval_id>
zeroeval evals results <eval_id>
zeroeval evals scores <eval_id> <scorer_id>
Querying
List commands support --where, --select, and --order:
zeroeval judges list --where "name~quality" --select "id,name" --order "name:asc"
Machine-readable spec
zeroeval spec cli --format json
zeroeval spec command "judges create"
Development
uv run --group dev pytest tests/cli/ -v # CLI tests
uv run --group dev pytest # Full suite
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file zeroeval-0.7.2.tar.gz.
File metadata
- Download URL: zeroeval-0.7.2.tar.gz
- Upload date:
- Size: 283.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
41bd52f83069be1e319616ebfa340e79c431a46f7bff2a6f5e84aec4cbb59704
|
|
| MD5 |
490fc1a095174b254e2e179c31947ac2
|
|
| BLAKE2b-256 |
8e66c139e0c07808a811e4e74aeda774b476c08072f6c3e65c5ae668e0aceb8d
|
File details
Details for the file zeroeval-0.7.2-py3-none-any.whl.
File metadata
- Download URL: zeroeval-0.7.2-py3-none-any.whl
- Upload date:
- Size: 175.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2d5a0e1faedf44aad129a81a70bd877f4e2697675c4931f2437bebd074b423c5
|
|
| MD5 |
ace9a555452300472d1de81786e68a3e
|
|
| BLAKE2b-256 |
5dc02c322fcf93e7828e9720d1fd013c06cf93b1820afa14fac3880b12a955fb
|