Skip to main content

LLM observability, quality scoring, hallucination detection, cost tracking, and agent run debugging — in 2 lines of code.

Project description

llm-evaltrack

Drop-in observability for LLM applications. Automatic quality scoring, hallucination detection, cost tracking, and agent run debugging — with 2 lines of code.

Install

pip install agentlens-monitor

Quick Start

import llm_observe

llm_observe.init(api_url="https://your-server.com/ingest")
llm_observe.patch_openai()     # auto-track all OpenAI calls
llm_observe.patch_anthropic()  # auto-track all Anthropic calls

That's it. Your existing code is unchanged. Every chat.completions.create() and messages.create() is now automatically tracked.

Agent Debugging (v0.2.0)

Trace multi-step agent runs to see every step, find where things break, and measure cost per span:

from llm_observe import trace_agent

with trace_agent("research_agent", input="Research renewable energy trends") as trace:

    with trace.span("web_search", span_type="retrieval") as s:
        results = search("renewable energy 2024")
        s.set_output(results)

    with trace.span("llm_summarize", span_type="llm", model="gpt-4o") as s:
        summary = llm.summarize(results)
        s.set_output(summary)
        s.set_tokens(1200)
        s.set_cost(0.009)

    with trace.span("fact_check", span_type="tool") as s:
        verified = fact_check(summary)
        s.set_output(verified)

    trace.set_output("Report complete")

Span types: llm · tool · retrieval · decision · custom

Each trace captures: total duration, tokens, cost, per-step timing, inputs/outputs, and errors. Errors inside a with block are automatically caught and marked as failed.

What Gets Tracked (auto)

Field Source
Input / Output Message content
Model response.model
Tokens response.usage
Cost (USD) Calculated from token counts
Quality Score Heuristic evaluation or LLM judge
Hallucination flags Automatic detection

Manual Tracking

llm_observe.track_llm_call(
    input="What is the capital of France?",
    output="Paris.",
    prompt="You are a helpful assistant.",
    model="gpt-4o",
    metadata={"feature": "qa", "user_id": "u_123", "cost_usd": 0.0003},
)

Configuration

llm_observe.init(
    api_url="https://your-server.com/ingest",
    api_key="your-secret",   # optional bearer token
    max_retries=3,
    timeout=5.0,
    enabled=True,            # set False in tests
)

Dashboard

Pair with the self-hosted server for:

  • Real-time quality trend charts
  • Bad response categorization
  • Root-cause analysis by prompt
  • Cost vs quality per model
  • Regression alerts
  • Agent Debugger — waterfall timeline of every span in a trace

Live demo: llm-evaltrack-production.up.railway.app

Self-host: github.com/Soufianeazz/llm-evaltrack

Changelog

v0.2.0

  • Added trace_agent() context manager for agent run tracing
  • Added span() and trace.span() for individual steps
  • Spans support: set_output(), set_tokens(), set_cost(), set_error()
  • Automatic error capture on exceptions inside with blocks
  • Nested span support via parent_span_id

v0.1.0

  • Initial release: patch_openai(), patch_anthropic(), track_llm_call()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentlens_monitor-0.2.0.tar.gz (77.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentlens_monitor-0.2.0-py3-none-any.whl (10.0 kB view details)

Uploaded Python 3

File details

Details for the file agentlens_monitor-0.2.0.tar.gz.

File metadata

  • Download URL: agentlens_monitor-0.2.0.tar.gz
  • Upload date:
  • Size: 77.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for agentlens_monitor-0.2.0.tar.gz
Algorithm Hash digest
SHA256 449abb5ea1e20a769bdcc7b4885d97e9285224281de42f32d300e96f17e0f92a
MD5 414d43de0d2af2d7b3c5e71d6286857d
BLAKE2b-256 6f06fedd5d458ae8d2adc09f5940fdb2d06915f871cb0dd7e1ca912febe346e3

See more details on using hashes here.

File details

Details for the file agentlens_monitor-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for agentlens_monitor-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d22252b2cc1474ed7186962d00ba3387425af8eef85273ef90ec5c10c41e5187
MD5 de21309d3b1c324291427429b9ea18de
BLAKE2b-256 cbb318beddf4199f76ee78bde9c32fbbc315372eb80f6661b3f8f2e8c5a07439

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page