Skip to main content

EvalKit Python SDK — LLM observability and tracing

Project description

EvalKit Python SDK

LLM observability and tracing for Python apps. One init() call auto-instruments your LLM clients, HTTP calls, database queries, and logging — then streams traces to Syntropy Labs.

Installation

pip install syntropylabs-evalkit

Optional provider extras:

pip install "syntropylabs-evalkit[openai]"      # OpenAI
pip install "syntropylabs-evalkit[anthropic]"   # Anthropic
pip install "syntropylabs-evalkit[all]"         # everything

The PyPI package is syntropylabs-evalkit, but you import it as evalkit.

Quickstart

import evalkit

evalkit.init(
    subscription_key="sk_...",       # your Syntropy Labs key
    service_name="my-service",
)

# That's it — your OpenAI / Anthropic / HTTP / DB calls are now traced automatically.
from openai import OpenAI

client = OpenAI()
resp = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
)

init() sets up auto-instrumentation for you. Context (including trace IDs) propagates automatically across threads — no manual wiring required.

Web frameworks

# FastAPI / Starlette
from evalkit import EvalKitMiddleware
app.add_middleware(EvalKitMiddleware)

# Flask
import evalkit
evalkit.instrument_flask(app)

# Django — add to MIDDLEWARE
"evalkit.EvalKitDjangoMiddleware"

Manual spans

import evalkit

end, ctx = evalkit.start_span("my-operation", {"key": "value"})
try:
    ...  # your work
finally:
    end("ok")

# Or as a decorator
@evalkit.trace_function()
def do_work(x):
    return x * 2

SQLAlchemy

import evalkit
evalkit.patch_sqlalchemy_engine(engine)

Evaluation

Score agent outputs locally — no judge-model cost, results appear as eval_result spans:

import evalkit

scores = evalkit.evaluate(
    output="Your return window is 30 days.",
    input="What is the return policy?",
    expected_tools=["search_knowledge_base"],
    tool_calls=[{"name": "search_knowledge_base"}],
    constraints={"required_terms": ["return", "30"]},
)
# → {"tool_trajectory_f1": 1.0, "required_terms": 1.0, ...}

Scenario simulation

Generate realistic synthetic-user scenarios from your agent's system prompt and tool list, then run each scenario against your real agent and score the results automatically:

import evalkit

evalkit.init(subscription_key="tk_live_...", service_name="my-agent")

# Step 1 — generate scenarios server-side (BYOK: your own key for the generation call)
scenarios = evalkit.generate_scenarios(
    agent_instructions=SYSTEM_PROMPT,
    tools=["search_kb", "lookup_order", "create_ticket"],
    count=5,
    provider="anthropic",           # "openai" or "google" also supported
    api_key="sk-ant-...",           # BYOK key for generation model
    model="claude-haiku-4-5-20251001",
)

# Step 2 — simulate each scenario against your real agent and score it
def entrypoint(ctx: evalkit.SimContext) -> evalkit.AgentTurnResult:
    # ctx.message    — the synthetic user's turn message
    # ctx.session_id — stable per-scenario, use it to keep multi-turn context
    reply, tools_used = run_my_agent(ctx.session_id, ctx.message)
    return evalkit.AgentTurnResult(
        text=reply,
        tool_calls=[{"name": t} for t in tools_used],
    )

report = evalkit.simulate_user(entrypoint, scenarios, tags=["ci"])
# Results appear in Dashboard → Simulations
print("Simulation ID:", report["simulation_id"])

Out-of-process agents (Claude Agent SDK)

The Claude Agent SDK runs the Anthropic call in a subprocess, so the normal in-process patch can't observe it. EvalKit wraps claude_agent_sdk.query() and ClaudeSDKClient.receive_response() instead, reading token/cost/latency from the ResultMessage the SDK already returns. This happens automatically via init() when claude_agent_sdk is installed. To call it explicitly:

evalkit.patch_claude_agent_sdk()

Flushing

Traces are batched and exported in the background. Flush before exit if needed:

evalkit.flush()

Links

License

Proprietary — © 2026 Syntropy Labs. All rights reserved. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

syntropylabs_evalkit-0.1.19.tar.gz (49.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

syntropylabs_evalkit-0.1.19-py3-none-any.whl (81.7 kB view details)

Uploaded Python 3

File details

Details for the file syntropylabs_evalkit-0.1.19.tar.gz.

File metadata

  • Download URL: syntropylabs_evalkit-0.1.19.tar.gz
  • Upload date:
  • Size: 49.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for syntropylabs_evalkit-0.1.19.tar.gz
Algorithm Hash digest
SHA256 e28d804db2512342eb37a106f1556b6edb8d7b4fe71193a95e969771f30b4262
MD5 bc7785ca79da256f94d17a14428acb69
BLAKE2b-256 ebcb4b873a40919a64ce7ff0ec3c7635bef4cb594a82fab318b4f2b45eaf0f72

See more details on using hashes here.

Provenance

The following attestation bundles were made for syntropylabs_evalkit-0.1.19.tar.gz:

Publisher: publish.yml on Syntropylabs-ai/evalkit_sdk_py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file syntropylabs_evalkit-0.1.19-py3-none-any.whl.

File metadata

File hashes

Hashes for syntropylabs_evalkit-0.1.19-py3-none-any.whl
Algorithm Hash digest
SHA256 2cedc512a8cbee537b5df6ccaed88b4171089b68b590a92918537eb2b5911055
MD5 4d17e20117b3b09e6d0b9101d2ea66d2
BLAKE2b-256 e42ee2270e2bb59a488d7b87b58e86a6776e09ca7dc88c01b365b92fd7931131

See more details on using hashes here.

Provenance

The following attestation bundles were made for syntropylabs_evalkit-0.1.19-py3-none-any.whl:

Publisher: publish.yml on Syntropylabs-ai/evalkit_sdk_py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page