EvalKit Python SDK — LLM observability and tracing

These details have not been verified by PyPI

Project links

Project description

EvalKit Python SDK

LLM observability and tracing for Python apps. One init() call auto-instruments your LLM clients, HTTP calls, database queries, and logging — then streams traces to Syntropy Labs.

Installation

pip install syntropylabs-evalkit

Optional provider extras:

pip install "syntropylabs-evalkit[openai]"      # OpenAI
pip install "syntropylabs-evalkit[anthropic]"   # Anthropic
pip install "syntropylabs-evalkit[all]"         # everything

The PyPI package is syntropylabs-evalkit, but you import it as evalkit.

Quickstart

import evalkit

evalkit.init(
    subscription_key="sk_...",       # your Syntropy Labs key
    service_name="my-service",
)

# That's it — your OpenAI / Anthropic / HTTP / DB calls are now traced automatically.
from openai import OpenAI

client = OpenAI()
resp = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
)

init() sets up auto-instrumentation for you. Context (including trace IDs) propagates automatically across threads — no manual wiring required.

Web frameworks

# FastAPI / Starlette
from evalkit import EvalKitMiddleware
app.add_middleware(EvalKitMiddleware)

# Flask
import evalkit
evalkit.instrument_flask(app)

# Django — add to MIDDLEWARE
"evalkit.EvalKitDjangoMiddleware"

Manual spans

import evalkit

end, ctx = evalkit.start_span("my-operation", {"key": "value"})
try:
    ...  # your work
finally:
    end("ok")

Tracing your own functions & tools (APM)

Function tracing is automatic. init() traces every function in your app's own source tree (the directory of the file that called init()) as each module imports — one function_call span per call, input/output/latency, no per-module wiring. Third-party libraries are never wrapped; only module-level functions are (class methods are left alone), and signatures are preserved so framework introspection (FastAPI Depends, etc.) keeps working.

# Nothing to wire — just init():
evalkit.init(subscription_key="tk_live_…", service_name="my-api")
# every function your app defines is now traced as it imports.

# Disable it:
#   EVALKIT_FUNCTION_TRACE=false        (env)
#   evalkit.init(..., function_tracing=False)
# Trace extra sibling packages outside the caller's dir:
#   evalkit.init(..., trace_packages=["support_bot", "workers"])

For finer control you can still opt in explicitly — a function, a tool, a whole class, or a module/package:

import evalkit

# One function -> function_call span (input / output / latency)
@evalkit.trace_function()
def do_work(x):
    return x * 2

# One tool -> tool_call span (renders in the Input/Output panels + tool metrics)
@evalkit.trace_tool()
def search_web(query: str):
    return run_search(query)

# Every method of a class, APM-style
@evalkit.traced
class OrderService:
    def place(self, order): ...
    def cancel(self, id): ...

# Every function defined in a module — one call
import myapp.services as svc
evalkit.trace_module(svc)

# Or your WHOLE app at once (recurses every submodule) — any framework:
import myapp
evalkit.trace_package(myapp)

Client-side tools you run yourself only show their output if you wrap them with trace_tool — the SDK sees the model's request but never your function's return value. Server-side tools (OpenAI web_search, …) and LangChain tools are captured automatically. Call init() before the decorated class/module is imported.

SQLAlchemy

import evalkit
evalkit.patch_sqlalchemy_engine(engine)

Evaluation

Score agent outputs locally — no judge-model cost, results appear as eval_result spans:

import evalkit

scores = evalkit.evaluate(
    output="Your return window is 30 days.",
    input="What is the return policy?",
    expected_tools=["search_knowledge_base"],
    tool_calls=[{"name": "search_knowledge_base"}],
    constraints={"required_terms": ["return", "30"]},
)
# → {"tool_trajectory_f1": 1.0, "required_terms": 1.0, ...}

Scenario simulation

Generate realistic synthetic-user scenarios from your agent's system prompt and tool list, then run each scenario against your real agent and score the results automatically:

import evalkit

evalkit.init(subscription_key="tk_live_...", service_name="my-agent")

# Step 1 — generate scenarios server-side (BYOK: your own key for the generation call)
scenarios = evalkit.generate_scenarios(
    agent_instructions=SYSTEM_PROMPT,
    tools=["search_kb", "lookup_order", "create_ticket"],
    count=5,
    provider="anthropic",           # "openai" or "google" also supported
    api_key="sk-ant-...",           # BYOK key for generation model
    model="claude-haiku-4-5-20251001",
)

# Step 2 — simulate each scenario against your real agent and score it
def entrypoint(ctx: evalkit.SimContext) -> evalkit.AgentTurnResult:
    # ctx.message    — the synthetic user's turn message
    # ctx.session_id — stable per-scenario, use it to keep multi-turn context
    reply, tools_used = run_my_agent(ctx.session_id, ctx.message)
    return evalkit.AgentTurnResult(
        text=reply,
        tool_calls=[{"name": t} for t in tools_used],
    )

report = evalkit.simulate_user(entrypoint, scenarios, tags=["ci"])
# Results appear in Dashboard → Simulations
print("Simulation ID:", report["simulation_id"])

Out-of-process agents (Claude Agent SDK)

The Claude Agent SDK runs the Anthropic call in a subprocess, so the normal in-process patch can't observe it. EvalKit wraps claude_agent_sdk.query() and ClaudeSDKClient.receive_response() instead, reading token/cost/latency from the ResultMessage the SDK already returns. This happens automatically via init() when claude_agent_sdk is installed. To call it explicitly:

evalkit.patch_claude_agent_sdk()

Flushing

Traces are batched and exported in the background. Flush before exit if needed:

evalkit.flush()

License

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.28

Jun 10, 2026

0.1.27

Jun 7, 2026

0.1.26

Jun 7, 2026

0.1.25

Jun 7, 2026

0.1.24

Jun 7, 2026

0.1.23

Jun 7, 2026

0.1.22

Jun 7, 2026

0.1.21

Jun 7, 2026

0.1.20

Jun 5, 2026

0.1.19

Jun 5, 2026

0.1.18

Jun 3, 2026

0.1.17

Jun 3, 2026

0.1.0

May 31, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

syntropylabs_evalkit-0.1.28.tar.gz (59.8 kB view details)

Uploaded Jun 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

syntropylabs_evalkit-0.1.28-py3-none-any.whl (94.4 kB view details)

Uploaded Jun 10, 2026 Python 3

File details

Details for the file syntropylabs_evalkit-0.1.28.tar.gz.

File metadata

Download URL: syntropylabs_evalkit-0.1.28.tar.gz
Upload date: Jun 10, 2026
Size: 59.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for syntropylabs_evalkit-0.1.28.tar.gz
Algorithm	Hash digest
SHA256	`d9c7a9ed93bad48b037603bad7216a65c223c618cfa7a067dbe0ccca7166b5ec`
MD5	`61b1ce440f686c8a81e73a0b4c6781e9`
BLAKE2b-256	`bba911819d2669532c23d335859456329eb6b58b782cec4ffeace19fcdcc1cbf`

See more details on using hashes here.

Provenance

The following attestation bundles were made for syntropylabs_evalkit-0.1.28.tar.gz:

Publisher: publish.yml on Syntropylabs-ai/evalkit_sdk_py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: syntropylabs_evalkit-0.1.28.tar.gz
- Subject digest: d9c7a9ed93bad48b037603bad7216a65c223c618cfa7a067dbe0ccca7166b5ec
- Sigstore transparency entry: 1780802843
- Sigstore integration time: Jun 10, 2026
Source repository:
- Permalink: Syntropylabs-ai/evalkit_sdk_py@b3f8de1b8dc40db56a6d7c12265b6625a0c9ee37
- Branch / Tag: refs/heads/main
- Owner: https://github.com/Syntropylabs-ai
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@b3f8de1b8dc40db56a6d7c12265b6625a0c9ee37
- Trigger Event: workflow_dispatch

File details

Details for the file syntropylabs_evalkit-0.1.28-py3-none-any.whl.

File metadata

Download URL: syntropylabs_evalkit-0.1.28-py3-none-any.whl
Upload date: Jun 10, 2026
Size: 94.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for syntropylabs_evalkit-0.1.28-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cd540911032c34c38791652963e0f860a04f743fbacb19bf7e1195f31d3bb88c`
MD5	`bdcb7f501a39f070ef0e51d1de685771`
BLAKE2b-256	`dfaee4599b06bd55bfb9bece7bba44590428a35e779afe8a989c86282372bf51`

See more details on using hashes here.

Provenance

The following attestation bundles were made for syntropylabs_evalkit-0.1.28-py3-none-any.whl:

Publisher: publish.yml on Syntropylabs-ai/evalkit_sdk_py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: syntropylabs_evalkit-0.1.28-py3-none-any.whl
- Subject digest: cd540911032c34c38791652963e0f860a04f743fbacb19bf7e1195f31d3bb88c
- Sigstore transparency entry: 1780803032
- Sigstore integration time: Jun 10, 2026
Source repository:
- Permalink: Syntropylabs-ai/evalkit_sdk_py@b3f8de1b8dc40db56a6d7c12265b6625a0c9ee37
- Branch / Tag: refs/heads/main
- Owner: https://github.com/Syntropylabs-ai
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@b3f8de1b8dc40db56a6d7c12265b6625a0c9ee37
- Trigger Event: workflow_dispatch

syntropylabs-evalkit 0.1.28

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

EvalKit Python SDK

Installation

Quickstart

Web frameworks

Manual spans

Tracing your own functions & tools (APM)

SQLAlchemy

Evaluation

Scenario simulation

Out-of-process agents (Claude Agent SDK)

Flushing

Links

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance