Skip to main content

LLM observability, quality scoring, hallucination detection, cost tracking, and agent run debugging — in 2 lines of code.

Project description

llm-evaltrack

Drop-in observability for LLM applications. Automatic quality scoring, hallucination detection, cost tracking, and agent run debugging — with 2 lines of code.

Install

pip install llm-evaltrack

Quick Start

import llm_observe

llm_observe.init(api_url="https://your-server.com/ingest")
llm_observe.patch_openai()     # auto-track all OpenAI calls
llm_observe.patch_anthropic()  # auto-track all Anthropic calls

That's it. Your existing code is unchanged. Every chat.completions.create() and messages.create() is now automatically tracked.

Agent Debugging (v0.2.0)

Trace multi-step agent runs to see every step, find where things break, and measure cost per span:

from llm_observe import trace_agent

with trace_agent("research_agent", input="Research renewable energy trends") as trace:

    with trace.span("web_search", span_type="retrieval") as s:
        results = search("renewable energy 2024")
        s.set_output(results)

    with trace.span("llm_summarize", span_type="llm", model="gpt-4o") as s:
        summary = llm.summarize(results)
        s.set_output(summary)
        s.set_tokens(1200)
        s.set_cost(0.009)

    with trace.span("fact_check", span_type="tool") as s:
        verified = fact_check(summary)
        s.set_output(verified)

    trace.set_output("Report complete")

Span types: llm · tool · retrieval · decision · custom

Each trace captures: total duration, tokens, cost, per-step timing, inputs/outputs, and errors. Errors inside a with block are automatically caught and marked as failed.

What Gets Tracked (auto)

Field Source
Input / Output Message content
Model response.model
Tokens response.usage
Cost (USD) Calculated from token counts
Quality Score Heuristic evaluation or LLM judge
Hallucination flags Automatic detection

Manual Tracking

llm_observe.track_llm_call(
    input="What is the capital of France?",
    output="Paris.",
    prompt="You are a helpful assistant.",
    model="gpt-4o",
    metadata={"feature": "qa", "user_id": "u_123", "cost_usd": 0.0003},
)

Configuration

llm_observe.init(
    api_url="https://your-server.com/ingest",
    api_key="your-secret",   # optional bearer token
    max_retries=3,
    timeout=5.0,
    enabled=True,            # set False in tests
)

Dashboard

Pair with the self-hosted server for:

  • Real-time quality trend charts
  • Bad response categorization
  • Root-cause analysis by prompt
  • Cost vs quality per model
  • Regression alerts
  • Agent Debugger — waterfall timeline of every span in a trace

Live demo: llm-evaltrack-production.up.railway.app

Self-host: github.com/Soufianeazz/llm-evaltrack

Changelog

v0.2.0

  • Added trace_agent() context manager for agent run tracing
  • Added span() and trace.span() for individual steps
  • Spans support: set_output(), set_tokens(), set_cost(), set_error()
  • Automatic error capture on exceptions inside with blocks
  • Nested span support via parent_span_id

v0.1.0

  • Initial release: patch_openai(), patch_anthropic(), track_llm_call()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentlens_monitor-0.1.0.tar.gz (55.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentlens_monitor-0.1.0-py3-none-any.whl (10.0 kB view details)

Uploaded Python 3

File details

Details for the file agentlens_monitor-0.1.0.tar.gz.

File metadata

  • Download URL: agentlens_monitor-0.1.0.tar.gz
  • Upload date:
  • Size: 55.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for agentlens_monitor-0.1.0.tar.gz
Algorithm Hash digest
SHA256 42a34aafde3368c33361572ea23cd7e946878870a67bf61a10657f3bec936cbb
MD5 ad18eeae69480d0534869baf06eebaad
BLAKE2b-256 87b6d3ac0fb821e23a8fc4a03a0cb19e256c953b20fcda4945b0d427816696e4

See more details on using hashes here.

File details

Details for the file agentlens_monitor-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for agentlens_monitor-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 caf1675fffe12ac25dc3eef2db41b0de1916b70533fb3b54e5b0845f216d55aa
MD5 128850a4134054fd812729d138568554
BLAKE2b-256 4e5fa3dca6f932da79d14a92171fc2936bb073bd1ff7668eb2ff950ba82f2453

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page