Skip to main content

Lightweight observability and diagnostics for local and self-hosted LLMs (Ollama, FastAPI, OpenWebUI).I know I messed it up for both of us and I am sorry, she will understand.. you can go ahead and use the library.

Project description

llm-scope-observer

Lightweight observability and diagnostics for local and self-hosted LLMs.

I know I messed it up for both of us and I am sorry, she will understand.. you can go ahead and use the library.

llm-scope-observer is a small Python package that wraps your local LLM calls (Ollama, FastAPI backends, OpenWebUI integrations, custom Python code) and records:

  • Latency per call
  • Token usage (input, output, total, tokens/sec)
  • CPU / RAM / (optional) GPU utilization
  • Simple hallucination-risk heuristics
  • Error information

All metrics are stored locally (SQLite by default) and visualized in a small FastAPI-based dashboard.


Features

  • Request interceptor: Decorator to wrap any Python function that calls an LLM.

    from llm_scope import monitor
    import ollama
    
    @monitor(model="llama3")
    def generate(prompt: str) -> str:
        result = ollama.generate(model="llama3", prompt=prompt)
        return result["response"]
    
  • Token estimation:

    • Approximate input and output tokens
    • Track total tokens and tokens/sec per call
  • System metrics snapshot (per request):

    • CPU %
    • RAM %
    • GPU % (optional, via pynvml if installed)
  • Hallucination risk heuristic (simple, signal-based):

    • Very long answer vs. short prompt
    • Strong claims without references
    • Repetition patterns
    • Basic self-contradiction patterns
  • Local dashboard:

    • FastAPI backend + simple HTML UI
    • SQLite storage by default
    • Shows latency, token trends, errors, and resource correlation per model

Installation

pip install llm-scope-observer

Optional GPU metrics:

pip install "llm-scope-observer[gpu]"

Requires Python 3.9+.


Quickstart

1. Instrument your LLM call

from llm_scope import monitor
import time

@monitor(model="test-model")
def generate(prompt: str) -> str:
    time.sleep(0.1)
    return "hello from llm-scope-observer"

Every time generate(...) runs, a record is written to a local SQLite database (llm_scope.db by default).

2. Run the dashboard

After some traffic:

llm-scope ui --host 127.0.0.1 --port 8000
# or
python -m llm_scope.cli ui --host 127.0.0.1 --port 8000

Open:

and you’ll see:

  • Average latency per model
  • Slowest calls (tail latency)
  • Token usage and tokens/sec
  • Error counts
  • CPU / RAM / GPU vs. latency
  • Hallucination score per call

How it works (high level)

  • Middleware / decorator:

    • @monitor(model="llama3") wraps any function.
    • Captures start/end times, prompt, response, and errors.
    • Sends a metrics record to the storage backend.
  • Metrics:

    • Token estimation from prompt and response text.
    • System stats from psutil (and optionally pynvml).
    • Simple heuristics for hallucination risk.
  • Storage:

    • SQLite via sqlite3 by default.
    • One table: llm_calls with timestamps, model, metrics, error, tags.
  • Dashboard:

    • FastAPI app.
    • Reads from the same SQLite file.
    • Renders an HTML summary page (no external JS required).

Roadmap

This is an early MVP. Planned next steps include:

  • Prompt clustering and slow-prompt detection
  • Model A vs. Model B comparison
  • Basic alerting hooks and export to tools like Grafana
  • Optional HTTP ingestion mode (sidecar / agent model)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_scope_observer-0.1.0a3.tar.gz (10.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_scope_observer-0.1.0a3-py3-none-any.whl (10.6 kB view details)

Uploaded Python 3

File details

Details for the file llm_scope_observer-0.1.0a3.tar.gz.

File metadata

  • Download URL: llm_scope_observer-0.1.0a3.tar.gz
  • Upload date:
  • Size: 10.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for llm_scope_observer-0.1.0a3.tar.gz
Algorithm Hash digest
SHA256 1fc4cc0be6369d19b5c99d07be6e53b1be7c0f17c5854bdc294183e2ac25b7e7
MD5 7ba38b18fd5912f5238f7b935e9cc83f
BLAKE2b-256 4f3160c4b29cd6c0b342a4f58e331f7d2584c43246c43ed11e75b3015d6642e6

See more details on using hashes here.

File details

Details for the file llm_scope_observer-0.1.0a3-py3-none-any.whl.

File metadata

File hashes

Hashes for llm_scope_observer-0.1.0a3-py3-none-any.whl
Algorithm Hash digest
SHA256 7afadaf64f794ce29e66c7319ad41539cab03f7311f9b70518a233e0a3b2ef53
MD5 4447e3943675813eeb68e046c06652d0
BLAKE2b-256 6d0a74643154bc8b0ed723092eba00b6965bfc414217b45c418dd211e50a86d9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page