Skip to main content

ZeroEval SDK

Project description

ZeroEval SDK

The Python SDK and CLI for ZeroEval -- monitoring, prompt management, judges, and optimization for AI products.

pip install zeroeval

Quick Start

1. Setup

zeroeval setup

This opens the ZeroEval dashboard, prompts for your project API key, and saves it along with your project context. Every command after this just works.

2. Trace your AI calls

import zeroeval as ze
import openai

ze.init()
client = openai.OpenAI()

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "What is ZeroEval?"}],
)

OpenAI, Gemini, LangChain, and LangGraph calls are automatically traced. No extra code needed.

3. Manage prompts

import zeroeval as ze

ze.init()

prompt = ze.prompt(
    name="support-triage",
    content="Classify this support ticket: {{ticket_text}}",
    variables={"ticket_text": "I can't log in to my account"},
)

Prompts are versioned, tagged, and linked to traces automatically.

4. Inspect from the CLI

zeroeval traces list --start-date 2025-01-01
zeroeval judges list
zeroeval prompts list

Installation

pip install zeroeval            # Core SDK
pip install zeroeval[openai]    # OpenAI auto-instrumentation
pip install zeroeval[gemini]    # Google Gemini
pip install zeroeval[langchain] # LangChain
pip install zeroeval[langgraph] # LangGraph
pip install zeroeval[all]       # Everything

Integrations are detected and instrumented automatically.


Authentication

Interactive (recommended):

zeroeval setup

Saves your API key and resolves your project automatically. You're ready to go.

Non-interactive (CI, agents):

zeroeval auth set --api-key sk_ze_...

In code:

import zeroeval as ze
ze.init(api_key="sk_ze_...")

Resolution order: explicit flag > environment variable (ZEROEVAL_API_KEY, ZEROEVAL_PROJECT_ID) > CLI config (~/.config/zeroeval/config.json).


Tracing

The SDK traces AI calls with a Session > Trace > Span hierarchy.

import zeroeval as ze

ze.init()

# Decorator-based tracing
@ze.span(name="process-ticket")
def process_ticket(ticket_text: str):
    # your logic here
    return result

# Context manager
with ze.span(name="embedding-step"):
    embeddings = get_embeddings(text)

# Manual spans
span = ze.start_span(name="retrieval")
results = search(query)
span.end()

# Tags for filtering in the dashboard
ze.set_tag("trace", {"environment": "production", "model": "gpt-4o"})

See INTEGRATIONS.md for automatic OpenAI, Gemini, LangChain, and LangGraph tracing.


Prompts

Prompts are versioned and tagged. Use ze.prompt() to fetch and render them:

prompt = ze.prompt(
    name="support-triage",
    content="Classify: {{ticket_text}}",
    variables={"ticket_text": ticket},
)

# Fetch a specific version or tag
prompt = ze.get_prompt("support-triage", version=3)
prompt = ze.get_prompt("support-triage", tag="production")

Prompts are automatically linked to traces and available for optimization.


Feedback

Submit feedback on prompt completions to build training data for optimization:

# Thumbs up/down
ze.send_feedback(
    prompt_slug="support-triage",
    completion_id="completion-uuid",
    thumbs_up=True,
    reason="Correct classification",
)

# Scored judge feedback
ze.send_feedback(
    prompt_slug="quality-scorer",
    completion_id="completion-uuid",
    thumbs_up=False,
    judge_id="judge-uuid",
    expected_score=3.5,
    score_direction="too_high",
    criteria_feedback={
        "accuracy": {"expected_score": 4.0, "reason": "Mostly correct"},
        "tone": {"expected_score": 1.0, "reason": "Too formal"},
    },
)

Feedback with reason and expected_output creates stronger training examples for prompt optimization.


Datasets & Evals

import zeroeval as ze

ze.init()

# Pull a dataset
dataset = ze.Dataset.pull("my-dataset")

for row in dataset:
    print(row.question, row.answer)

# Run an evaluation
@ze.task(outputs=["prediction"])
def solve(row):
    return {"prediction": llm_call(row["question"])}

@ze.evaluation(mode="row", outputs=["correct"])
def check(answer, prediction):
    return {"correct": int(answer == prediction)}

run = dataset.eval(solve, workers=8)
run = run.score([check], column_map={"answer": "answer", "prediction": "prediction"})

CLI

The zeroeval CLI covers monitoring, judges, prompts, optimization, datasets, and evals. Designed for both humans and automation (--output json).

Setup

zeroeval setup                              # Interactive setup
zeroeval auth set --api-key sk_ze_...       # Non-interactive
zeroeval auth show --redact                 # Show config

Global flags

Flag Default Description
--output text|json text Output format. json emits stable JSON to stdout.
--project-id auto from setup Override project context.
--api-base-url https://api.zeroeval.com Override API URL.
--timeout 60.0 Request timeout in seconds.
--quiet off Suppress non-essential logs.

Monitoring

zeroeval sessions list --start-date 2025-01-01
zeroeval sessions get <session_id>
zeroeval traces list --start-date 2025-01-01
zeroeval traces get <trace_id>
zeroeval traces spans <trace_id>
zeroeval spans list --start-date 2025-01-01
zeroeval spans get <span_id>

Judges

zeroeval judges list
zeroeval judges get <judge_id>
zeroeval judges evaluations <judge_id> --limit 100
zeroeval judges criteria <judge_id>
zeroeval judges insights <judge_id>
zeroeval judges performance <judge_id>
zeroeval judges calibration <judge_id>
zeroeval judges versions <judge_id>

# Create a judge
zeroeval judges create \
  --name "answer-quality" \
  --prompt-file judge.txt \
  --evaluation-type binary \
  --sample-rate 1.0

# Submit feedback on a judge evaluation
zeroeval judges feedback create \
  --span-id <span_id> \
  --thumbs-up \
  --reason "Correct evaluation"

Prompts

zeroeval prompts list
zeroeval prompts get <slug> --version 3
zeroeval prompts versions <slug>
zeroeval prompts tags <slug>

# Submit feedback on a completion
zeroeval prompts feedback create \
  --prompt-slug support-triage \
  --completion-id <id> \
  --thumbs-down \
  --reason "Wrong classification"

Optimization

# Prompt optimization
zeroeval optimize prompt list <task_id>
zeroeval optimize prompt start <task_id> --optimizer-type quick_refine
zeroeval optimize prompt promote <task_id> <run_id> --yes

# Judge optimization
zeroeval optimize judge list <judge_id>
zeroeval optimize judge start <judge_id>
zeroeval optimize judge promote <judge_id> <run_id> --yes

Datasets & Evals

zeroeval datasets list
zeroeval datasets get <name>
zeroeval datasets versions <name>
zeroeval datasets rows <name> --version 3 --limit 200

zeroeval evals list --status completed
zeroeval evals get <eval_id>
zeroeval evals summary <eval_id>
zeroeval evals results <eval_id>
zeroeval evals scores <eval_id> <scorer_id>

Querying

List commands support --where, --select, and --order:

zeroeval judges list --where "name~quality" --select "id,name" --order "name:asc"

Machine-readable spec

zeroeval spec cli --format json
zeroeval spec command "judges create"

Development

uv run --group dev pytest tests/cli/ -v   # CLI tests
uv run --group dev pytest                  # Full suite

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zeroeval-0.7.2.tar.gz (283.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zeroeval-0.7.2-py3-none-any.whl (175.6 kB view details)

Uploaded Python 3

File details

Details for the file zeroeval-0.7.2.tar.gz.

File metadata

  • Download URL: zeroeval-0.7.2.tar.gz
  • Upload date:
  • Size: 283.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.12

File hashes

Hashes for zeroeval-0.7.2.tar.gz
Algorithm Hash digest
SHA256 41bd52f83069be1e319616ebfa340e79c431a46f7bff2a6f5e84aec4cbb59704
MD5 490fc1a095174b254e2e179c31947ac2
BLAKE2b-256 8e66c139e0c07808a811e4e74aeda774b476c08072f6c3e65c5ae668e0aceb8d

See more details on using hashes here.

File details

Details for the file zeroeval-0.7.2-py3-none-any.whl.

File metadata

  • Download URL: zeroeval-0.7.2-py3-none-any.whl
  • Upload date:
  • Size: 175.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.12

File hashes

Hashes for zeroeval-0.7.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2d5a0e1faedf44aad129a81a70bd877f4e2697675c4931f2437bebd074b423c5
MD5 ace9a555452300472d1de81786e68a3e
BLAKE2b-256 5dc02c322fcf93e7828e9720d1fd013c06cf93b1820afa14fac3880b12a955fb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page