Skip to main content

AI Evaluation Platform SDK — traces, evaluations, assertions, and workflow tracing for LLM apps

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

pauly4010-evalai-sdk (Python)

Python SDK for the AI Evaluation Platform — traces, evaluations, assertions, and workflow tracing for LLM applications.

Feature-compatible with the TypeScript SDK (@pauly4010/evalai-sdk).

Install

pip install pauly4010-evalai-sdk

With optional integrations:

pip install "pauly4010-evalai-sdk[openai]"       # OpenAI tracing
pip install "pauly4010-evalai-sdk[anthropic]"    # Anthropic tracing
pip install "pauly4010-evalai-sdk[all]"          # Everything

Quick start

import asyncio
from evalai_sdk import AIEvalClient, CreateTraceParams

async def main():
    # Zero-config (reads EVALAI_API_KEY env var)
    client = AIEvalClient.init()

    # Or explicit
    client = AIEvalClient(api_key="sk-...", organization_id=1)

    # Create a trace
    trace = await client.traces.create(CreateTraceParams(name="user-query"))
    print(trace.id, trace.trace_id)

    # List evaluations
    evals = await client.evaluations.list()
    for ev in evals:
        print(ev.name, ev.status)

    await client.close()

asyncio.run(main())

Context manager

async with AIEvalClient(api_key="sk-...") as client:
    trace = await client.traces.create(CreateTraceParams(name="test"))

Assertions

20+ assertion functions for evaluating LLM output:

from evalai_sdk import expect

result = expect("The capital of France is Paris.").to_contain("Paris")
assert result.passed

result = expect("Hello World").to_not_contain_pii()
assert result.passed

result = expect(0.95).to_be_between(0.0, 1.0)
assert result.passed

Standalone functions:

from evalai_sdk import contains_keywords, has_no_toxicity, matches_pattern

assert contains_keywords("quick brown fox", ["quick", "fox"])
assert has_no_toxicity("Thank you for your help.")
assert matches_pattern("abc-123", r"\w+-\d+")

Test suites

from evalai_sdk import create_test_suite
from evalai_sdk.types import TestSuiteCase, TestSuiteConfig

suite = create_test_suite("my-suite", TestSuiteConfig(
    evaluator=my_llm_function,
    test_cases=[
        TestSuiteCase(name="greeting", input="Hello", expected_output="Hi there!"),
        TestSuiteCase(name="pii-check", input="Describe yourself",
                      assertions=[{"type": "not_contains_pii"}]),
    ],
))

result = await suite.run()
print(f"{result.passed_count}/{result.total} passed")

Workflow tracing

Track multi-agent workflows with handoffs, decisions, and cost:

from evalai_sdk import AIEvalClient, WorkflowTracer
from evalai_sdk.types import CostCategory, HandoffType, RecordCostParams

client = AIEvalClient.init()
tracer = WorkflowTracer(client)

ctx = await tracer.start_workflow("research-pipeline")
span = await tracer.start_agent_span("researcher", {"query": "AI trends"})
await tracer.end_agent_span(span, {"findings": "..."})

await tracer.record_handoff("researcher", "writer", handoff_type=HandoffType.DELEGATION)
await tracer.record_cost(RecordCostParams(
    agent_name="researcher", category=CostCategory.LLM_INPUT, amount=0.05, tokens=1500
))

await tracer.end_workflow()
print(f"Total cost: ${tracer.get_total_cost():.2f}")

OpenAI integration

from openai import AsyncOpenAI
from evalai_sdk import AIEvalClient
from evalai_sdk.integrations.openai import trace_openai

openai_client = AsyncOpenAI()
eval_client = AIEvalClient.init()

traced = trace_openai(openai_client, eval_client)
response = await traced.chat.completions.create(model="gpt-4", messages=[...])

Anthropic integration

from anthropic import AsyncAnthropic
from evalai_sdk import AIEvalClient
from evalai_sdk.integrations.anthropic import trace_anthropic

anthropic_client = AsyncAnthropic()
eval_client = AIEvalClient.init()

traced = trace_anthropic(anthropic_client, eval_client)
response = await traced.messages.create(model="claude-3-opus-20240229", messages=[...])

API modules

Module Methods
client.traces create, list, get, update, delete, create_span, list_spans
client.evaluations create, get, list, update, delete, create_test_case, list_test_cases, create_run, list_runs, get_run
client.llm_judge evaluate, create_config, list_configs, list_results, get_alignment
client.annotations create, list, tasks.create, tasks.list, tasks.get, tasks.items.create, tasks.items.list
client.developer get_usage, get_usage_summary, api_keys.*, webhooks.*
client.organizations get_current

Development

cd src/packages/sdk-python
pip install -e ".[dev]"
pytest

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pauly4010_evalai_sdk-1.0.0.tar.gz (54.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pauly4010_evalai_sdk-1.0.0-py3-none-any.whl (57.2 kB view details)

Uploaded Python 3

File details

Details for the file pauly4010_evalai_sdk-1.0.0.tar.gz.

File metadata

  • Download URL: pauly4010_evalai_sdk-1.0.0.tar.gz
  • Upload date:
  • Size: 54.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for pauly4010_evalai_sdk-1.0.0.tar.gz
Algorithm Hash digest
SHA256 bf99126997e8e1e3f6111c19a6601d6ab5e4796fb23ac77f7d80e695ebf30fd5
MD5 d1d1fcbf6631cd7abb45b09aaff7f3ee
BLAKE2b-256 23e1b08171d5c402c2a83f6de63d7eee813436276603e963f273ab719d6baec5

See more details on using hashes here.

File details

Details for the file pauly4010_evalai_sdk-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pauly4010_evalai_sdk-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b6a31f9df33728439298e700da0d14e18375fc9132c3679324bb6eb1d70aa060
MD5 135fbc9fb6bacd7da3075e2b0daf05e2
BLAKE2b-256 5401ef3980234f8cb9d58a0fe159f97ce15b19c662129a7d5bcc4d89af6d16ee

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page