Skip to main content

ZeroEval SDK

Project description

ZeroEval SDK

The Python SDK and CLI for ZeroEval -- monitoring, prompt management, judges, and optimization for AI products.

pip install zeroeval

Quick Start

1. Setup

zeroeval setup

This opens the ZeroEval dashboard, prompts for your project API key, and saves it along with your project context. Every command after this just works.

2. Trace your AI calls

import zeroeval as ze
import openai

ze.init()
client = openai.OpenAI()

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "What is ZeroEval?"}],
)

OpenAI, Gemini, LangChain, and LangGraph calls are automatically traced. No extra code needed.

3. Manage prompts

import zeroeval as ze

ze.init()

prompt = ze.prompt(
    name="support-triage",
    content="Classify this support ticket: {{ticket_text}}",
    variables={"ticket_text": "I can't log in to my account"},
)

Prompts are versioned, tagged, and linked to traces automatically.

4. Inspect from the CLI

zeroeval traces list --start-date 2025-01-01
zeroeval judges list
zeroeval prompts list

Installation

pip install zeroeval            # Core SDK
pip install zeroeval[openai]    # OpenAI auto-instrumentation
pip install zeroeval[gemini]    # Google Gemini
pip install zeroeval[langchain] # LangChain
pip install zeroeval[langgraph] # LangGraph
pip install zeroeval[all]       # Everything

Integrations are detected and instrumented automatically.


Authentication

Interactive (recommended):

zeroeval setup

Saves your API key and resolves your project automatically. You're ready to go.

Non-interactive (CI, agents):

zeroeval auth set --api-key sk_ze_...

In code:

import zeroeval as ze
ze.init(api_key="sk_ze_...")

Resolution order: explicit flag > environment variable (ZEROEVAL_API_KEY, ZEROEVAL_PROJECT_ID) > CLI config (~/.config/zeroeval/config.json).


Tracing

The SDK traces AI calls with a Session > Trace > Span hierarchy.

import zeroeval as ze

ze.init()

# Decorator-based tracing
@ze.span(name="process-ticket")
def process_ticket(ticket_text: str):
    # your logic here
    return result

# Context manager
with ze.span(name="embedding-step"):
    embeddings = get_embeddings(text)

# Manual spans
span = ze.start_span(name="retrieval")
results = search(query)
span.end()

# Tags for filtering in the dashboard
ze.set_tag("trace", {"environment": "production", "model": "gpt-4o"})

See INTEGRATIONS.md for automatic OpenAI, Gemini, LangChain, and LangGraph tracing.


Prompts

Prompts are versioned and tagged. Use ze.prompt() to fetch and render them:

prompt = ze.prompt(
    name="support-triage",
    content="Classify: {{ticket_text}}",
    variables={"ticket_text": ticket},
)

# Fetch a specific version or tag
prompt = ze.get_prompt("support-triage", version=3)
prompt = ze.get_prompt("support-triage", tag="production")

Prompts are automatically linked to traces and available for optimization.


Feedback

Submit feedback on prompt completions to build training data for optimization:

# Thumbs up/down
ze.send_feedback(
    prompt_slug="support-triage",
    completion_id="completion-uuid",
    thumbs_up=True,
    reason="Correct classification",
)

# Scored judge feedback
ze.send_feedback(
    prompt_slug="quality-scorer",
    completion_id="completion-uuid",
    thumbs_up=False,
    judge_id="judge-uuid",
    expected_score=3.5,
    score_direction="too_high",
    criteria_feedback={
        "accuracy": {"expected_score": 4.0, "reason": "Mostly correct"},
        "tone": {"expected_score": 1.0, "reason": "Too formal"},
    },
)

Feedback with reason and expected_output creates stronger training examples for prompt optimization.


Datasets & Evals

import zeroeval as ze

ze.init()

# Pull a dataset
dataset = ze.Dataset.pull("my-dataset")

for row in dataset:
    print(row.question, row.answer)

# Run an evaluation
@ze.task(outputs=["prediction"])
def solve(row):
    return {"prediction": llm_call(row["question"])}

@ze.evaluation(mode="row", outputs=["correct"])
def check(answer, prediction):
    return {"correct": int(answer == prediction)}

run = dataset.eval(solve, workers=8)
run = run.score([check], column_map={"answer": "answer", "prediction": "prediction"})

CLI

The zeroeval CLI covers monitoring, judges, prompts, optimization, datasets, and evals. Designed for both humans and automation (--output json).

Setup

zeroeval setup                              # Interactive setup
zeroeval auth set --api-key sk_ze_...       # Non-interactive
zeroeval auth show --redact                 # Show config

Global flags

Flag Default Description
--output text|json text Output format. json emits stable JSON to stdout.
--project-id auto from setup Override project context.
--api-base-url https://api.zeroeval.com Override API URL.
--timeout 60.0 Request timeout in seconds.
--quiet off Suppress non-essential logs.

Monitoring

zeroeval sessions list --start-date 2025-01-01
zeroeval sessions get <session_id>
zeroeval traces list --start-date 2025-01-01
zeroeval traces get <trace_id>
zeroeval traces spans <trace_id>
zeroeval spans list --start-date 2025-01-01
zeroeval spans get <span_id>

Judges

zeroeval judges list
zeroeval judges get <judge_id>
zeroeval judges evaluations <judge_id> --limit 100
zeroeval judges criteria <judge_id>
zeroeval judges insights <judge_id>
zeroeval judges performance <judge_id>
zeroeval judges calibration <judge_id>
zeroeval judges versions <judge_id>

# Create a judge
zeroeval judges create \
  --name "answer-quality" \
  --prompt-file judge.txt \
  --evaluation-type binary \
  --sample-rate 1.0

# Submit feedback on a judge evaluation
zeroeval judges feedback create \
  --span-id <span_id> \
  --thumbs-up \
  --reason "Correct evaluation"

Prompts

zeroeval prompts list
zeroeval prompts get <slug> --version 3
zeroeval prompts versions <slug>
zeroeval prompts tags <slug>

# Submit feedback on a completion
zeroeval prompts feedback create \
  --prompt-slug support-triage \
  --completion-id <id> \
  --thumbs-down \
  --reason "Wrong classification"

Optimization

# Prompt optimization
zeroeval optimize prompt list <task_id>
zeroeval optimize prompt start <task_id> --optimizer-type quick_refine
zeroeval optimize prompt promote <task_id> <run_id> --yes

# Judge optimization
zeroeval optimize judge list <judge_id>
zeroeval optimize judge start <judge_id>
zeroeval optimize judge promote <judge_id> <run_id> --yes

Datasets & Evals

zeroeval datasets list
zeroeval datasets get <name>
zeroeval datasets versions <name>
zeroeval datasets rows <name> --version 3 --limit 200

zeroeval evals list --status completed
zeroeval evals get <eval_id>
zeroeval evals summary <eval_id>
zeroeval evals results <eval_id>
zeroeval evals scores <eval_id> <scorer_id>

Querying

List commands support --where, --select, and --order:

zeroeval judges list --where "name~quality" --select "id,name" --order "name:asc"

Machine-readable spec

zeroeval spec cli --format json
zeroeval spec command "judges create"

Development

uv run --group dev pytest tests/cli/ -v   # CLI tests
uv run --group dev pytest                  # Full suite

Project details


Release history Release notifications | RSS feed

This version

0.7.3

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zeroeval-0.7.3.tar.gz (346.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zeroeval-0.7.3-py3-none-any.whl (187.4 kB view details)

Uploaded Python 3

File details

Details for the file zeroeval-0.7.3.tar.gz.

File metadata

  • Download URL: zeroeval-0.7.3.tar.gz
  • Upload date:
  • Size: 346.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.12

File hashes

Hashes for zeroeval-0.7.3.tar.gz
Algorithm Hash digest
SHA256 91a8ed41b229399b1b657acfc33b5336d32d342687fd48be9e6b0a13b6318be8
MD5 99341aed92c34b5f40401813e08bb898
BLAKE2b-256 e8caffad573bd84df3e8cfb4c74c81f591ae6bcf770b693f2cda61f6504681b8

See more details on using hashes here.

File details

Details for the file zeroeval-0.7.3-py3-none-any.whl.

File metadata

  • Download URL: zeroeval-0.7.3-py3-none-any.whl
  • Upload date:
  • Size: 187.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.12

File hashes

Hashes for zeroeval-0.7.3-py3-none-any.whl
Algorithm Hash digest
SHA256 6052371a85e880c5231d0ce64967b776af19290f28323b30313e3d9dcbb2a2bb
MD5 774e867e56c16fc5dd8936e02607b89e
BLAKE2b-256 e1e603a76e591589434ed4cce86d2ee968b4f685b0f1d4acfd5dc4f7d8774c69

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page