Drop-in observability for LLM applications — quality scoring, hallucination detection, cost tracking, and agent run debugging.

These details have not been verified by PyPI

Project description

llm-evaltrack

Drop-in observability for LLM applications. Automatic quality scoring, hallucination detection, cost tracking, and agent run debugging — with 2 lines of code.

Install

pip install llm-evaltrack

Quick Start

import llm_observe

llm_observe.init(api_url="https://your-server.com/ingest")
llm_observe.patch_openai()     # auto-track all OpenAI calls
llm_observe.patch_anthropic()  # auto-track all Anthropic calls

That's it. Your existing code is unchanged. Every chat.completions.create() and messages.create() is now automatically tracked.

Agent Debugging (v0.2.0)

Trace multi-step agent runs to see every step, find where things break, and measure cost per span:

from llm_observe import trace_agent

with trace_agent("research_agent", input="Research renewable energy trends") as trace:

    with trace.span("web_search", span_type="retrieval") as s:
        results = search("renewable energy 2024")
        s.set_output(results)

    with trace.span("llm_summarize", span_type="llm", model="gpt-4o") as s:
        summary = llm.summarize(results)
        s.set_output(summary)
        s.set_tokens(1200)
        s.set_cost(0.009)

    with trace.span("fact_check", span_type="tool") as s:
        verified = fact_check(summary)
        s.set_output(verified)

    trace.set_output("Report complete")

Span types: llm · tool · retrieval · decision · custom

Each trace captures: total duration, tokens, cost, per-step timing, inputs/outputs, and errors. Errors inside a with block are automatically caught and marked as failed.

What Gets Tracked (auto)

Field	Source
Input / Output	Message content
Model	`response.model`
Tokens	`response.usage`
Cost (USD)	Calculated from token counts
Quality Score	Heuristic evaluation or LLM judge
Hallucination flags	Automatic detection

Manual Tracking

llm_observe.track_llm_call(
    input="What is the capital of France?",
    output="Paris.",
    prompt="You are a helpful assistant.",
    model="gpt-4o",
    metadata={"feature": "qa", "user_id": "u_123", "cost_usd": 0.0003},
)

Configuration

llm_observe.init(
    api_url="https://your-server.com/ingest",
    api_key="your-secret",   # optional bearer token
    max_retries=3,
    timeout=5.0,
    enabled=True,            # set False in tests
)

Dashboard

Pair with the self-hosted server for:

Real-time quality trend charts
Bad response categorization
Root-cause analysis by prompt
Cost vs quality per model
Regression alerts
Agent Debugger — waterfall timeline of every span in a trace

Live demo: llm-evaltrack-production.up.railway.app

Self-host: github.com/Soufianeazz/llm-evaltrack

Changelog

v0.2.0

Added trace_agent() context manager for agent run tracing
Added span() and trace.span() for individual steps
Spans support: set_output(), set_tokens(), set_cost(), set_error()
Automatic error capture on exceptions inside with blocks
Nested span support via parent_span_id

v0.1.0

Initial release: patch_openai(), patch_anthropic(), track_llm_call()

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.0

Apr 15, 2026

0.1.0

Apr 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_evaltrack-0.2.0.tar.gz (44.1 kB view details)

Uploaded Apr 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llm_evaltrack-0.2.0-py3-none-any.whl (10.0 kB view details)

Uploaded Apr 15, 2026 Python 3

File details

Details for the file llm_evaltrack-0.2.0.tar.gz.

File metadata

Download URL: llm_evaltrack-0.2.0.tar.gz
Upload date: Apr 15, 2026
Size: 44.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for llm_evaltrack-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`bb82b362476da790d995c59f2826476fa347a5e66d6886b9543cac4f7405fe56`
MD5	`e6c54f7e2b1677519275528e3c65dd22`
BLAKE2b-256	`2085665ed2b77cf7a325d732106fe5f860e5b1c71287e13e565b02059ac246a9`

See more details on using hashes here.

File details

Details for the file llm_evaltrack-0.2.0-py3-none-any.whl.

File metadata

Download URL: llm_evaltrack-0.2.0-py3-none-any.whl
Upload date: Apr 15, 2026
Size: 10.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for llm_evaltrack-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a4cf04ddd4545a34d21a9a6172163878c8c7492dcfb259e0e35aa2bec7452adf`
MD5	`e0a7a3fc348e941dd2442968c1732058`
BLAKE2b-256	`9c788a8b337eb452f08c07afb3d732b357e3c0ebe921567a433c7c7faae601fe`

See more details on using hashes here.

llm-evaltrack 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

llm-evaltrack

Install

Quick Start

Agent Debugging (v0.2.0)

What Gets Tracked (auto)

Manual Tracking

Configuration

Dashboard

Changelog

v0.2.0

v0.1.0

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes