Skip to main content

Lightweight observability and diagnostics for local and self-hosted LLMs (Ollama, FastAPI, OpenWebUI).I know I messed it up for both of us and I am sorry, she will understand.. you can go ahead and use the library.

Project description

llm-scope-observer

Lightweight observability and diagnostics for local and self-hosted LLMs.

I know I messed it up for both of us and I am sorry, she will understand.. you can go ahead and use the library.

llm-scope-observer is a small Python package that wraps your local LLM calls (Ollama, FastAPI backends, OpenWebUI integrations, custom Python code) and records:

  • Latency per call
  • Token usage (input, output, total, tokens/sec)
  • CPU / RAM / (optional) GPU utilization
  • Simple hallucination-risk heuristics
  • Error information

All metrics are stored locally (SQLite by default) and visualized in a small FastAPI-based dashboard.


Features

  • Request interceptor: Decorator to wrap any Python function that calls an LLM.

    from llm_scope import monitor
    import ollama
    
    @monitor(model="llama3")
    def generate(prompt: str) -> str:
        result = ollama.generate(model="llama3", prompt=prompt)
        return result["response"]
    
  • Token estimation:

    • Approximate input and output tokens
    • Track total tokens and tokens/sec per call
  • System metrics snapshot (per request):

    • CPU %
    • RAM %
    • GPU % (optional, via pynvml if installed)
  • Hallucination risk heuristic (simple, signal-based):

    • Very long answer vs. short prompt
    • Strong claims without references
    • Repetition patterns
    • Basic self-contradiction patterns
  • Local dashboard:

    • FastAPI backend + simple HTML UI
    • SQLite storage by default
    • Shows latency, token trends, errors, and resource correlation per model

Installation

pip install llm-scope-observer

Optional GPU metrics:

pip install "llm-scope-observer[gpu]"

Requires Python 3.9+.


Quickstart

1. Instrument your LLM call

from llm_scope import monitor
import time

@monitor(model="test-model")
def generate(prompt: str) -> str:
    time.sleep(0.1)
    return "hello from llm-scope-observer"

Every time generate(...) runs, a record is written to a local SQLite database (llm_scope.db by default).

2. Run the dashboard

After some traffic:

llm-scope ui --host 127.0.0.1 --port 8000
# or
python -m llm_scope.cli ui --host 127.0.0.1 --port 8000

Open:

and you’ll see:

  • Average latency per model
  • Slowest calls (tail latency)
  • Token usage and tokens/sec
  • Error counts
  • CPU / RAM / GPU vs. latency
  • Hallucination score per call

How it works (high level)

  • Middleware / decorator:

    • @monitor(model="llama3") wraps any function.
    • Captures start/end times, prompt, response, and errors.
    • Sends a metrics record to the storage backend.
  • Metrics:

    • Token estimation from prompt and response text.
    • System stats from psutil (and optionally pynvml).
    • Simple heuristics for hallucination risk.
  • Storage:

    • SQLite via sqlite3 by default.
    • One table: llm_calls with timestamps, model, metrics, error, tags.
  • Dashboard:

    • FastAPI app.
    • Reads from the same SQLite file.
    • Renders an HTML summary page (no external JS required).

Roadmap

This is an early MVP. Planned next steps include:

  • Prompt clustering and slow-prompt detection
  • Model A vs. Model B comparison
  • Basic alerting hooks and export to tools like Grafana
  • Optional HTTP ingestion mode (sidecar / agent model)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_scope_observer-0.1.0a4.tar.gz (10.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_scope_observer-0.1.0a4-py3-none-any.whl (10.6 kB view details)

Uploaded Python 3

File details

Details for the file llm_scope_observer-0.1.0a4.tar.gz.

File metadata

  • Download URL: llm_scope_observer-0.1.0a4.tar.gz
  • Upload date:
  • Size: 10.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for llm_scope_observer-0.1.0a4.tar.gz
Algorithm Hash digest
SHA256 a02dbb5a9652b82a14d06478f2aef8c94889ac6296c66a8782ba50c01a024aaf
MD5 daf2ba09df1cf6e5a7deb257d3bbe800
BLAKE2b-256 358bb75330fd11848d0f1cb4a7704789f8e16b39627b190a17af6a94bd94fc93

See more details on using hashes here.

File details

Details for the file llm_scope_observer-0.1.0a4-py3-none-any.whl.

File metadata

File hashes

Hashes for llm_scope_observer-0.1.0a4-py3-none-any.whl
Algorithm Hash digest
SHA256 2810e38a533c0ddee8212004dee15564eb4b922c321aff6911a9a577b6aa5996
MD5 c8dafcc492846da31ff6eb7ddfccf9b8
BLAKE2b-256 4b9ad8a30077aaa50868c8fb34ee7c9e63bfc6ccb44f2c114e03edcd1b7d5a9d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page