Skip to main content

ZeroEval SDK

Project description

ZeroEval SDK

The Python SDK and CLI for ZeroEval -- monitoring, prompt management, judges, and optimization for AI products.

pip install zeroeval

Quick Start

1. Setup

zeroeval setup

This opens the ZeroEval dashboard, prompts for your project API key, and saves it along with your project context. Every command after this just works.

2. Trace your AI calls

import zeroeval as ze
import openai

ze.init()
client = openai.OpenAI()

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "What is ZeroEval?"}],
)

OpenAI, Gemini, LangChain, and LangGraph calls are automatically traced. No extra code needed.

3. Manage prompts

import zeroeval as ze

ze.init()

prompt = ze.prompt(
    name="support-triage",
    content="Classify this support ticket: {{ticket_text}}",
    variables={"ticket_text": "I can't log in to my account"},
)

Prompts are versioned, tagged, and linked to traces automatically.

4. Inspect from the CLI

zeroeval traces list --start-date 2025-01-01
zeroeval judges list
zeroeval prompts list

Installation

pip install zeroeval            # Core SDK
pip install zeroeval[openai]    # OpenAI auto-instrumentation
pip install zeroeval[gemini]    # Google Gemini
pip install zeroeval[langchain] # LangChain
pip install zeroeval[langgraph] # LangGraph
pip install zeroeval[all]       # Everything

Integrations are detected and instrumented automatically.


Authentication

Interactive (recommended):

zeroeval setup

Saves your API key and resolves your project automatically. You're ready to go.

Non-interactive (CI, agents):

zeroeval auth set --api-key sk_ze_...

In code:

import zeroeval as ze
ze.init(api_key="sk_ze_...")

Resolution order: explicit flag > environment variable (ZEROEVAL_API_KEY, ZEROEVAL_PROJECT_ID) > CLI config (~/.config/zeroeval/config.json).


Tracing

The SDK traces AI calls with a Session > Trace > Span hierarchy.

import zeroeval as ze

ze.init()

# Decorator-based tracing
@ze.span(name="process-ticket")
def process_ticket(ticket_text: str):
    # your logic here
    return result

# Context manager
with ze.span(name="embedding-step"):
    embeddings = get_embeddings(text)

# Manual spans
span = ze.start_span(name="retrieval")
results = search(query)
span.end()

# Tags for filtering in the dashboard
ze.set_tag("trace", {"environment": "production", "model": "gpt-4o"})

See INTEGRATIONS.md for automatic OpenAI, Gemini, LangChain, and LangGraph tracing.


Prompts

Prompts are versioned and tagged. Use ze.prompt() to fetch and render them:

prompt = ze.prompt(
    name="support-triage",
    content="Classify: {{ticket_text}}",
    variables={"ticket_text": ticket},
)

# Fetch a specific version or tag
prompt = ze.get_prompt("support-triage", version=3)
prompt = ze.get_prompt("support-triage", tag="production")

Prompts are automatically linked to traces and available for optimization.


Feedback

Submit feedback on prompt completions to build training data for optimization:

# Thumbs up/down
ze.send_feedback(
    prompt_slug="support-triage",
    completion_id="completion-uuid",
    thumbs_up=True,
    reason="Correct classification",
)

# Scored judge feedback
ze.send_feedback(
    prompt_slug="quality-scorer",
    completion_id="completion-uuid",
    thumbs_up=False,
    judge_id="judge-uuid",
    expected_score=3.5,
    score_direction="too_high",
    criteria_feedback={
        "accuracy": {"expected_score": 4.0, "reason": "Mostly correct"},
        "tone": {"expected_score": 1.0, "reason": "Too formal"},
    },
)

Feedback with reason and expected_output creates stronger training examples for prompt optimization.


Datasets & Evals

import zeroeval as ze

ze.init()

# Pull a dataset
dataset = ze.Dataset.pull("my-dataset")

for row in dataset:
    print(row.question, row.answer)

# Run an evaluation
@ze.task(outputs=["prediction"])
def solve(row):
    return {"prediction": llm_call(row["question"])}

@ze.evaluation(mode="row", outputs=["correct"])
def check(answer, prediction):
    return {"correct": int(answer == prediction)}

run = dataset.eval(solve, workers=8)
run = run.score([check], column_map={"answer": "answer", "prediction": "prediction"})

CLI

The zeroeval CLI covers monitoring, judges, prompts, optimization, datasets, and evals. Designed for both humans and automation (--output json).

Setup

zeroeval setup                              # Interactive setup
zeroeval auth set --api-key sk_ze_...       # Non-interactive
zeroeval auth show --redact                 # Show config

Global flags

Flag Default Description
--output text|json text Output format. json emits stable JSON to stdout.
--project-id auto from setup Override project context.
--api-base-url https://api.zeroeval.com Override API URL.
--timeout 60.0 Request timeout in seconds.
--quiet off Suppress non-essential logs.

Monitoring

zeroeval sessions list --start-date 2025-01-01
zeroeval sessions get <session_id>
zeroeval traces list --start-date 2025-01-01
zeroeval traces get <trace_id>
zeroeval traces spans <trace_id>
zeroeval spans list --start-date 2025-01-01
zeroeval spans get <span_id>

Judges

zeroeval judges list
zeroeval judges get <judge_id>
zeroeval judges evaluations <judge_id> --limit 100
zeroeval judges criteria <judge_id>
zeroeval judges insights <judge_id>
zeroeval judges performance <judge_id>
zeroeval judges calibration <judge_id>
zeroeval judges versions <judge_id>

# Create a judge
zeroeval judges create \
  --name "answer-quality" \
  --prompt-file judge.txt \
  --evaluation-type binary \
  --sample-rate 1.0

# Submit feedback on a judge evaluation
zeroeval judges feedback create \
  --span-id <span_id> \
  --thumbs-up \
  --reason "Correct evaluation"

Prompts

zeroeval prompts list
zeroeval prompts get <slug> --version 3
zeroeval prompts versions <slug>
zeroeval prompts tags <slug>

# Submit feedback on a completion
zeroeval prompts feedback create \
  --prompt-slug support-triage \
  --completion-id <id> \
  --thumbs-down \
  --reason "Wrong classification"

Optimization

# Prompt optimization
zeroeval optimize prompt list <task_id>
zeroeval optimize prompt start <task_id> --optimizer-type quick_refine
zeroeval optimize prompt promote <task_id> <run_id> --yes

# Judge optimization
zeroeval optimize judge list <judge_id>
zeroeval optimize judge start <judge_id>
zeroeval optimize judge promote <judge_id> <run_id> --yes

Datasets & Evals

zeroeval datasets list
zeroeval datasets get <name>
zeroeval datasets versions <name>
zeroeval datasets rows <name> --version 3 --limit 200

zeroeval evals list --status completed
zeroeval evals get <eval_id>
zeroeval evals summary <eval_id>
zeroeval evals results <eval_id>
zeroeval evals scores <eval_id> <scorer_id>

Querying

List commands support --where, --select, and --order:

zeroeval judges list --where "name~quality" --select "id,name" --order "name:asc"

Machine-readable spec

zeroeval spec cli --format json
zeroeval spec command "judges create"

Development

uv run --group dev pytest tests/cli/ -v   # CLI tests
uv run --group dev pytest                  # Full suite

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zeroeval-0.8.0.tar.gz (344.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zeroeval-0.8.0-py3-none-any.whl (185.8 kB view details)

Uploaded Python 3

File details

Details for the file zeroeval-0.8.0.tar.gz.

File metadata

  • Download URL: zeroeval-0.8.0.tar.gz
  • Upload date:
  • Size: 344.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.12

File hashes

Hashes for zeroeval-0.8.0.tar.gz
Algorithm Hash digest
SHA256 6b6641c661b6937210dee483fa4e8400b93489c221e2a7c9eca37c7aa586120d
MD5 73a7e13c0bcb4e856fcdf74c73e740c7
BLAKE2b-256 3520a4b82b7668324a1de309d5254f79d4039018bd3d2320e2040efdd29e15d1

See more details on using hashes here.

File details

Details for the file zeroeval-0.8.0-py3-none-any.whl.

File metadata

  • Download URL: zeroeval-0.8.0-py3-none-any.whl
  • Upload date:
  • Size: 185.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.12

File hashes

Hashes for zeroeval-0.8.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f6eefecd251d2b57793ff1afecb503ec92537606d63d5b17434e511765502ba6
MD5 3b0777559358c977c9690c4e18a9b855
BLAKE2b-256 b3f257ff93fef67a25464a34790268ce0c9e6737c9b2e4da70e51394c3ee2855

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page