LLM observability, quality scoring, hallucination detection, cost tracking, and agent run debugging — in 2 lines of code.
Project description
llm-evaltrack
Drop-in observability for LLM applications. Automatic quality scoring, hallucination detection, cost tracking, and agent run debugging — with 2 lines of code.
Install
pip install llm-evaltrack
Quick Start
import llm_observe
llm_observe.init(api_url="https://your-server.com/ingest")
llm_observe.patch_openai() # auto-track all OpenAI calls
llm_observe.patch_anthropic() # auto-track all Anthropic calls
That's it. Your existing code is unchanged. Every chat.completions.create() and messages.create() is now automatically tracked.
Agent Debugging (v0.2.0)
Trace multi-step agent runs to see every step, find where things break, and measure cost per span:
from llm_observe import trace_agent
with trace_agent("research_agent", input="Research renewable energy trends") as trace:
with trace.span("web_search", span_type="retrieval") as s:
results = search("renewable energy 2024")
s.set_output(results)
with trace.span("llm_summarize", span_type="llm", model="gpt-4o") as s:
summary = llm.summarize(results)
s.set_output(summary)
s.set_tokens(1200)
s.set_cost(0.009)
with trace.span("fact_check", span_type="tool") as s:
verified = fact_check(summary)
s.set_output(verified)
trace.set_output("Report complete")
Span types: llm · tool · retrieval · decision · custom
Each trace captures: total duration, tokens, cost, per-step timing, inputs/outputs, and errors.
Errors inside a with block are automatically caught and marked as failed.
What Gets Tracked (auto)
| Field | Source |
|---|---|
| Input / Output | Message content |
| Model | response.model |
| Tokens | response.usage |
| Cost (USD) | Calculated from token counts |
| Quality Score | Heuristic evaluation or LLM judge |
| Hallucination flags | Automatic detection |
Manual Tracking
llm_observe.track_llm_call(
input="What is the capital of France?",
output="Paris.",
prompt="You are a helpful assistant.",
model="gpt-4o",
metadata={"feature": "qa", "user_id": "u_123", "cost_usd": 0.0003},
)
Configuration
llm_observe.init(
api_url="https://your-server.com/ingest",
api_key="your-secret", # optional bearer token
max_retries=3,
timeout=5.0,
enabled=True, # set False in tests
)
Dashboard
Pair with the self-hosted server for:
- Real-time quality trend charts
- Bad response categorization
- Root-cause analysis by prompt
- Cost vs quality per model
- Regression alerts
- Agent Debugger — waterfall timeline of every span in a trace
Live demo: llm-evaltrack-production.up.railway.app
Self-host: github.com/Soufianeazz/llm-evaltrack
Changelog
v0.2.0
- Added
trace_agent()context manager for agent run tracing - Added
span()andtrace.span()for individual steps - Spans support:
set_output(),set_tokens(),set_cost(),set_error() - Automatic error capture on exceptions inside
withblocks - Nested span support via
parent_span_id
v0.1.0
- Initial release:
patch_openai(),patch_anthropic(),track_llm_call()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentlens_monitor-0.1.0.tar.gz.
File metadata
- Download URL: agentlens_monitor-0.1.0.tar.gz
- Upload date:
- Size: 55.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
42a34aafde3368c33361572ea23cd7e946878870a67bf61a10657f3bec936cbb
|
|
| MD5 |
ad18eeae69480d0534869baf06eebaad
|
|
| BLAKE2b-256 |
87b6d3ac0fb821e23a8fc4a03a0cb19e256c953b20fcda4945b0d427816696e4
|
File details
Details for the file agentlens_monitor-0.1.0-py3-none-any.whl.
File metadata
- Download URL: agentlens_monitor-0.1.0-py3-none-any.whl
- Upload date:
- Size: 10.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
caf1675fffe12ac25dc3eef2db41b0de1916b70533fb3b54e5b0845f216d55aa
|
|
| MD5 |
128850a4134054fd812729d138568554
|
|
| BLAKE2b-256 |
4e5fa3dca6f932da79d14a92171fc2936bb073bd1ff7668eb2ff950ba82f2453
|