AI Evaluation Platform SDK — traces, evaluations, assertions, and workflow tracing for LLM apps
This project has been archived.
The maintainers of this project have marked this project as archived. No new releases are expected.
Project description
pauly4010-evalai-sdk (Python)
Python SDK for the AI Evaluation Platform — traces, evaluations, assertions, and workflow tracing for LLM applications.
Feature-compatible with the TypeScript SDK (
@pauly4010/evalai-sdk).
Install
pip install pauly4010-evalai-sdk
With optional integrations:
pip install "pauly4010-evalai-sdk[openai]" # OpenAI tracing
pip install "pauly4010-evalai-sdk[anthropic]" # Anthropic tracing
pip install "pauly4010-evalai-sdk[all]" # Everything
Quick start
import asyncio
from evalai_sdk import AIEvalClient, CreateTraceParams
async def main():
# Zero-config (reads EVALAI_API_KEY env var)
client = AIEvalClient.init()
# Or explicit
client = AIEvalClient(api_key="sk-...", organization_id=1)
# Create a trace
trace = await client.traces.create(CreateTraceParams(name="user-query"))
print(trace.id, trace.trace_id)
# List evaluations
evals = await client.evaluations.list()
for ev in evals:
print(ev.name, ev.status)
await client.close()
asyncio.run(main())
Context manager
async with AIEvalClient(api_key="sk-...") as client:
trace = await client.traces.create(CreateTraceParams(name="test"))
Assertions
20+ assertion functions for evaluating LLM output:
from evalai_sdk import expect
result = expect("The capital of France is Paris.").to_contain("Paris")
assert result.passed
result = expect("Hello World").to_not_contain_pii()
assert result.passed
result = expect(0.95).to_be_between(0.0, 1.0)
assert result.passed
Standalone functions:
from evalai_sdk import contains_keywords, has_no_toxicity, matches_pattern
assert contains_keywords("quick brown fox", ["quick", "fox"])
assert has_no_toxicity("Thank you for your help.")
assert matches_pattern("abc-123", r"\w+-\d+")
Test suites
from evalai_sdk import create_test_suite
from evalai_sdk.types import TestSuiteCase, TestSuiteConfig
suite = create_test_suite("my-suite", TestSuiteConfig(
evaluator=my_llm_function,
test_cases=[
TestSuiteCase(name="greeting", input="Hello", expected_output="Hi there!"),
TestSuiteCase(name="pii-check", input="Describe yourself",
assertions=[{"type": "not_contains_pii"}]),
],
))
result = await suite.run()
print(f"{result.passed_count}/{result.total} passed")
Workflow tracing
Track multi-agent workflows with handoffs, decisions, and cost:
from evalai_sdk import AIEvalClient, WorkflowTracer
from evalai_sdk.types import CostCategory, HandoffType, RecordCostParams
client = AIEvalClient.init()
tracer = WorkflowTracer(client)
ctx = await tracer.start_workflow("research-pipeline")
span = await tracer.start_agent_span("researcher", {"query": "AI trends"})
await tracer.end_agent_span(span, {"findings": "..."})
await tracer.record_handoff("researcher", "writer", handoff_type=HandoffType.DELEGATION)
await tracer.record_cost(RecordCostParams(
agent_name="researcher", category=CostCategory.LLM_INPUT, amount=0.05, tokens=1500
))
await tracer.end_workflow()
print(f"Total cost: ${tracer.get_total_cost():.2f}")
OpenAI integration
from openai import AsyncOpenAI
from evalai_sdk import AIEvalClient
from evalai_sdk.integrations.openai import trace_openai
openai_client = AsyncOpenAI()
eval_client = AIEvalClient.init()
traced = trace_openai(openai_client, eval_client)
response = await traced.chat.completions.create(model="gpt-4", messages=[...])
Anthropic integration
from anthropic import AsyncAnthropic
from evalai_sdk import AIEvalClient
from evalai_sdk.integrations.anthropic import trace_anthropic
anthropic_client = AsyncAnthropic()
eval_client = AIEvalClient.init()
traced = trace_anthropic(anthropic_client, eval_client)
response = await traced.messages.create(model="claude-3-opus-20240229", messages=[...])
API modules
| Module | Methods |
|---|---|
client.traces |
create, list, get, update, delete, create_span, list_spans |
client.evaluations |
create, get, list, update, delete, create_test_case, list_test_cases, create_run, list_runs, get_run |
client.llm_judge |
evaluate, create_config, list_configs, list_results, get_alignment |
client.annotations |
create, list, tasks.create, tasks.list, tasks.get, tasks.items.create, tasks.items.list |
client.developer |
get_usage, get_usage_summary, api_keys.*, webhooks.* |
client.organizations |
get_current |
Development
cd src/packages/sdk-python
pip install -e ".[dev]"
pytest
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pauly4010_evalai_sdk-1.0.0.tar.gz.
File metadata
- Download URL: pauly4010_evalai_sdk-1.0.0.tar.gz
- Upload date:
- Size: 54.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bf99126997e8e1e3f6111c19a6601d6ab5e4796fb23ac77f7d80e695ebf30fd5
|
|
| MD5 |
d1d1fcbf6631cd7abb45b09aaff7f3ee
|
|
| BLAKE2b-256 |
23e1b08171d5c402c2a83f6de63d7eee813436276603e963f273ab719d6baec5
|
File details
Details for the file pauly4010_evalai_sdk-1.0.0-py3-none-any.whl.
File metadata
- Download URL: pauly4010_evalai_sdk-1.0.0-py3-none-any.whl
- Upload date:
- Size: 57.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b6a31f9df33728439298e700da0d14e18375fc9132c3679324bb6eb1d70aa060
|
|
| MD5 |
135fbc9fb6bacd7da3075e2b0daf05e2
|
|
| BLAKE2b-256 |
5401ef3980234f8cb9d58a0fe159f97ce15b19c662129a7d5bcc4d89af6d16ee
|