AI Evaluation Platform SDK — traces, evaluations, assertions, and workflow tracing for LLM apps

These details have not been verified by PyPI

Project links

Project description

pauly4010-evalai-sdk (Python)

Python SDK for the AI Evaluation Platform — traces, evaluations, assertions, and workflow tracing for LLM applications.

Feature-compatible with the TypeScript SDK (@pauly4010/evalai-sdk).

Install

pip install pauly4010-evalai-sdk

With optional integrations:

pip install "pauly4010-evalai-sdk[openai]"       # OpenAI tracing
pip install "pauly4010-evalai-sdk[anthropic]"    # Anthropic tracing
pip install "pauly4010-evalai-sdk[all]"          # Everything

Quick start

import asyncio
from evalai_sdk import AIEvalClient, CreateTraceParams

async def main():
    # Zero-config (reads EVALAI_API_KEY env var)
    client = AIEvalClient.init()

    # Or explicit
    client = AIEvalClient(api_key="sk-...", organization_id=1)

    # Create a trace
    trace = await client.traces.create(CreateTraceParams(name="user-query"))
    print(trace.id, trace.trace_id)

    # List evaluations
    evals = await client.evaluations.list()
    for ev in evals:
        print(ev.name, ev.status)

    await client.close()

asyncio.run(main())

Context manager

async with AIEvalClient(api_key="sk-...") as client:
    trace = await client.traces.create(CreateTraceParams(name="test"))

Assertions

20+ assertion functions for evaluating LLM output:

from evalai_sdk import expect

result = expect("The capital of France is Paris.").to_contain("Paris")
assert result.passed

result = expect("Hello World").to_not_contain_pii()
assert result.passed

result = expect(0.95).to_be_between(0.0, 1.0)
assert result.passed

Standalone functions:

from evalai_sdk import contains_keywords, has_no_toxicity, matches_pattern

assert contains_keywords("quick brown fox", ["quick", "fox"])
assert has_no_toxicity("Thank you for your help.")
assert matches_pattern("abc-123", r"\w+-\d+")

Test suites

from evalai_sdk import create_test_suite
from evalai_sdk.types import TestSuiteCase, TestSuiteConfig

suite = create_test_suite("my-suite", TestSuiteConfig(
    evaluator=my_llm_function,
    test_cases=[
        TestSuiteCase(name="greeting", input="Hello", expected_output="Hi there!"),
        TestSuiteCase(name="pii-check", input="Describe yourself",
                      assertions=[{"type": "not_contains_pii"}]),
    ],
))

result = await suite.run()
print(f"{result.passed_count}/{result.total} passed")

Workflow tracing

Track multi-agent workflows with handoffs, decisions, and cost:

from evalai_sdk import AIEvalClient, WorkflowTracer
from evalai_sdk.types import CostCategory, HandoffType, RecordCostParams

client = AIEvalClient.init()
tracer = WorkflowTracer(client)

ctx = await tracer.start_workflow("research-pipeline")
span = await tracer.start_agent_span("researcher", {"query": "AI trends"})
await tracer.end_agent_span(span, {"findings": "..."})

await tracer.record_handoff("researcher", "writer", handoff_type=HandoffType.DELEGATION)
await tracer.record_cost(RecordCostParams(
    agent_name="researcher", category=CostCategory.LLM_INPUT, amount=0.05, tokens=1500
))

await tracer.end_workflow()
print(f"Total cost: ${tracer.get_total_cost():.2f}")

OpenAI integration

from openai import AsyncOpenAI
from evalai_sdk import AIEvalClient
from evalai_sdk.integrations.openai import trace_openai

openai_client = AsyncOpenAI()
eval_client = AIEvalClient.init()

traced = trace_openai(openai_client, eval_client)
response = await traced.chat.completions.create(model="gpt-4", messages=[...])

Anthropic integration

from anthropic import AsyncAnthropic
from evalai_sdk import AIEvalClient
from evalai_sdk.integrations.anthropic import trace_anthropic

anthropic_client = AsyncAnthropic()
eval_client = AIEvalClient.init()

traced = trace_anthropic(anthropic_client, eval_client)
response = await traced.messages.create(model="claude-3-opus-20240229", messages=[...])

API modules

Module	Methods
`client.traces`	`create`, `list`, `get`, `update`, `delete`, `create_span`, `list_spans`
`client.evaluations`	`create`, `get`, `list`, `update`, `delete`, `create_test_case`, `list_test_cases`, `create_run`, `list_runs`, `get_run`
`client.llm_judge`	`evaluate`, `create_config`, `list_configs`, `list_results`, `get_alignment`
`client.annotations`	`create`, `list`, `tasks.create`, `tasks.list`, `tasks.get`, `tasks.items.create`, `tasks.items.list`
`client.developer`	`get_usage`, `get_usage_summary`, `api_keys.`, `webhooks.`
`client.organizations`	`get_current`

Development

cd src/packages/sdk-python
pip install -e ".[dev]"
pytest

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.9.1

Mar 1, 2026

1.9.0

Mar 1, 2026

1.0.1

Mar 1, 2026

This version

1.0.0

Feb 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pauly4010_evalai_sdk-1.0.0.tar.gz (54.1 kB view details)

Uploaded Feb 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pauly4010_evalai_sdk-1.0.0-py3-none-any.whl (57.2 kB view details)

Uploaded Feb 28, 2026 Python 3

File details

Details for the file pauly4010_evalai_sdk-1.0.0.tar.gz.

File metadata

Download URL: pauly4010_evalai_sdk-1.0.0.tar.gz
Upload date: Feb 28, 2026
Size: 54.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for pauly4010_evalai_sdk-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`bf99126997e8e1e3f6111c19a6601d6ab5e4796fb23ac77f7d80e695ebf30fd5`
MD5	`d1d1fcbf6631cd7abb45b09aaff7f3ee`
BLAKE2b-256	`23e1b08171d5c402c2a83f6de63d7eee813436276603e963f273ab719d6baec5`

See more details on using hashes here.

File details

Details for the file pauly4010_evalai_sdk-1.0.0-py3-none-any.whl.

File metadata

Download URL: pauly4010_evalai_sdk-1.0.0-py3-none-any.whl
Upload date: Feb 28, 2026
Size: 57.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for pauly4010_evalai_sdk-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b6a31f9df33728439298e700da0d14e18375fc9132c3679324bb6eb1d70aa060`
MD5	`135fbc9fb6bacd7da3075e2b0daf05e2`
BLAKE2b-256	`5401ef3980234f8cb9d58a0fe159f97ce15b19c662129a7d5bcc4d89af6d16ee`

See more details on using hashes here.

pauly4010-evalai-sdk 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

pauly4010-evalai-sdk (Python)

Install

Quick start

Context manager

Assertions

Test suites

Workflow tracing

OpenAI integration

Anthropic integration

API modules

Development

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes