Skip to main content

I know I messed it up for both of us and I am sorry, she will understand.. you can go ahead and use the library.

Project description

llm-scope-observer

Lightweight observability and diagnostics for local and self-hosted LLMs.

I know I messed it up for both of us and I am sorry, she will understand.. you can go ahead and use the library.

llm-scope-observer is a small Python package that wraps your local LLM calls (Ollama, FastAPI backends, OpenWebUI integrations, custom Python code) and records:

  • Latency per call
  • Token usage (input, output, total, tokens/sec)
  • CPU / RAM / (optional) GPU utilization
  • Simple hallucination-risk heuristics
  • Error information

All metrics are stored locally (SQLite by default) and visualized in a small FastAPI-based dashboard.


Features

  • Request interceptor: Decorator to wrap any Python function that calls an LLM.

    from llm_scope import monitor
    import ollama
    
    @monitor(model="llama3")
    def generate(prompt: str) -> str:
        result = ollama.generate(model="llama3", prompt=prompt)
        return result["response"]
    
  • Token estimation:

    • Approximate input and output tokens
    • Track total tokens and tokens/sec per call
  • System metrics snapshot (per request):

    • CPU %
    • RAM %
    • GPU % (optional, via pynvml if installed)
  • Hallucination risk heuristic (simple, signal-based):

    • Very long answer vs. short prompt
    • Strong claims without references
    • Repetition patterns
    • Basic self-contradiction patterns
  • Local dashboard:

    • FastAPI backend + simple HTML UI
    • SQLite storage by default
    • Shows latency, token trends, errors, and resource correlation per model

Installation

pip install llm-scope-observer

Optional GPU metrics:

pip install "llm-scope-observer[gpu]"

Requires Python 3.9+.


Quickstart

1. Instrument your LLM call

from llm_scope import monitor
import time

@monitor(model="test-model")
def generate(prompt: str) -> str:
    time.sleep(0.1)
    return "hello from llm-scope-observer"

Every time generate(...) runs, a record is written to a local SQLite database (llm_scope.db by default).

2. Run the dashboard

After some traffic:

llm-scope ui --host 127.0.0.1 --port 8000
# or
python -m llm_scope.cli ui --host 127.0.0.1 --port 8000

Open:

and you’ll see:

  • Average latency per model
  • Slowest calls (tail latency)
  • Token usage and tokens/sec
  • Error counts
  • CPU / RAM / GPU vs. latency
  • Hallucination score per call

How it works (high level)

  • Middleware / decorator:

    • @monitor(model="llama3") wraps any function.
    • Captures start/end times, prompt, response, and errors.
    • Sends a metrics record to the storage backend.
  • Metrics:

    • Token estimation from prompt and response text.
    • System stats from psutil (and optionally pynvml).
    • Simple heuristics for hallucination risk.
  • Storage:

    • SQLite via sqlite3 by default.
    • One table: llm_calls with timestamps, model, metrics, error, tags.
  • Dashboard:

    • FastAPI app.
    • Reads from the same SQLite file.
    • Renders an HTML summary page (no external JS required).

Roadmap

This is an early MVP. Planned next steps include:

  • Prompt clustering and slow-prompt detection
  • Model A vs. Model B comparison
  • Basic alerting hooks and export to tools like Grafana
  • Optional HTTP ingestion mode (sidecar / agent model)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_scope_observer-0.1.0a2.tar.gz (10.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_scope_observer-0.1.0a2-py3-none-any.whl (10.6 kB view details)

Uploaded Python 3

File details

Details for the file llm_scope_observer-0.1.0a2.tar.gz.

File metadata

  • Download URL: llm_scope_observer-0.1.0a2.tar.gz
  • Upload date:
  • Size: 10.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for llm_scope_observer-0.1.0a2.tar.gz
Algorithm Hash digest
SHA256 562fba0db5b4cf9e9fb9eeac8fdbfa3d8a0ed93dbc8f6af54dfc28fd63869f38
MD5 adb1da3e0b7919a280b94dd3e90e9604
BLAKE2b-256 0f8856c3a3fb0223b829c45eaf0455b929bf0e4a73223213fd0d0a44c6e64aff

See more details on using hashes here.

File details

Details for the file llm_scope_observer-0.1.0a2-py3-none-any.whl.

File metadata

File hashes

Hashes for llm_scope_observer-0.1.0a2-py3-none-any.whl
Algorithm Hash digest
SHA256 d2b3b32c5167c0a4781a632cbf4aaa2dbe52443998675e2f81bcacb49d186ad5
MD5 6ddb5cd02d16f1679d44acd4869aa1c3
BLAKE2b-256 02d985d88ee093e52d02fe27633ca402f7effa6ccc937e5aad827309668c1146

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page