A lightweight metrics exporter for LiteLLM — scrapes Prometheus metrics and serves JSON for dashboards like Homepage and Home Assistant.
Project description
A lightweight metrics exporter for LiteLLM — scrapes Prometheus metrics, stores them in SQLite, and serves JSON for dashboards like Homepage and home automation systems like Home Assistant.
Why
LiteLLM exposes usage metrics in Prometheus format, but consuming them typically means standing up Prometheus, Grafana, and an alertmanager — a stack that's overkill if you just want to see "how much did I spend today?" on a dashboard. For homelab enthusiasts running LiteLLM alongside services like Homepage and Home Assistant, that's a lot of overhead for very simple needs.
LiteLLM Pulse is a lightweight observability layer that sits between LiteLLM's /metrics endpoint and a JSON-based REST API. It scrapes Prometheus text format on a schedule, stores time-series snapshots in SQLite, and serves clean JSON that any HTTP client can consume — no Prometheus server, no Grafana dashboards, no query language to learn.
It is not designed to replace Prometheus or Grafana. If you need multi-source metrics, complex alerting rules, or rich visual dashboards, use those tools. LiteLLM Pulse is for the 90% case: you have a single LiteLLM instance, you want today's token spend on your Homepage dashboard, and you don't want to run three more containers to get it.
What It Does
LiteLLM exposes usage metrics (requests, tokens, spend) in Prometheus text format as cumulative counters. LiteLLM Pulse scrapes that endpoint on a schedule, parses the metrics, stores snapshots in SQLite, and serves them as clean JSON over a REST API.
Beyond raw cumulative totals, LiteLLM Pulse computes deltas (change since last scrape), and daily/weekly/monthly aggregates (sum of deltas since the start of the current day/week/month) — all backed by SQLite for persistence across restarts. It also breaks down all metrics that carry a model label per model, so you can see which models are being used and how much each costs.
LiteLLM /metrics ──scrape──▶ LiteLLM Pulse ──JSON──▶ Homepage / Home Assistant / anything
│
▼
SQLite
(time-series storage)
LiteLLM Setup
The LiteLLM
/metricsendpoint is not enabled by default. You must configure LiteLLM to publish Prometheus metrics before LiteLLM Pulse can scrape them.
Add the prometheus callback to your LiteLLM proxy config (config.yaml):
litellm_settings:
callbacks:
- prometheus
Start LiteLLM and verify the endpoint:
curl http://localhost:4000/metrics/
If you see Prometheus-formatted text, LiteLLM is publishing metrics and you're ready to set up LiteLLM Pulse.
See the LiteLLM Prometheus docs for advanced configuration options.
Quick Start
Docker Compose
services:
litellm-pulse:
image: ghcr.io/jakepenzak/litellm-pulse:latest
container_name: litellm-pulse
restart: unless-stopped
environment:
LITELLM_PULSE_METRICS_URL: "http://litellm:4000/metrics/"
LITELLM_PULSE_SCRAPE_INTERVAL: "60"
LITELLM_PULSE_PORT: "8000"
LITELLM_PULSE_TIMEZONE: "America/New_York"
# LITELLM_PULSE_METRICS_API_KEY: "sk-your-litellm-api-key"
ports:
- "8000:8000"
volumes:
- litellm-pulse-data:/app/data
volumes:
litellm-pulse-data:
Docker Run
docker run -d \
--name litellm-pulse \
-p 8000:8000 \
-e LITELLM_PULSE_METRICS_URL=http://litellm:4000/metrics/ \
-e LITELLM_PULSE_SCRAPE_INTERVAL=60 \
-e LITELLM_PULSE_TIMEZONE=America/New_York \
-v litellm-pulse-data:/app/data \
ghcr.io/jakepenzak/litellm-pulse:latest
Pre-release Docker images are also available with the
:devtag:ghcr.io/jakepenzak/litellm-pulse:dev
Running Locally (with uv)
uv sync
uv run litellm-pulse
Running from PyPI
uvx litellm-pulse # run directly
uv tool install litellm-pulse && litellm-pulse # install permanently
pip install litellm-pulse && litellm-pulse # with pip
Running dev/prerelease versions
uvx --prerelease=allow litellm-pulse # uvx with prereleases
pip install --pre litellm-pulse # pip with prereleases
All of the above accept CLI arguments:
uv run litellm-pulse --port 9000 --metrics-url http://localhost:4000/metrics/ --timezone America/New_York
Configuration
All settings can be specified via environment variables prefixed with LITELLM_PULSE_, CLI arguments, or both. CLI arguments take precedence over environment variables. No config files required.
Run litellm-pulse --help to see all available options.
Core Settings
| CLI Flag | Env Variable | Default | Description |
|---|---|---|---|
--metrics-url |
LITELLM_PULSE_METRICS_URL |
http://litellm:4000/metrics/ |
Prometheus metrics endpoint to scrape |
--scrape-interval |
LITELLM_PULSE_SCRAPE_INTERVAL |
60 |
Seconds between scrapes |
--port |
LITELLM_PULSE_PORT |
8000 |
Port to serve the API on |
--host |
LITELLM_PULSE_HOST |
0.0.0.0 |
Address to bind to |
--verify-ssl / --no-verify-ssl |
LITELLM_PULSE_VERIFY_SSL |
false |
Whether to verify TLS certificates when scraping |
--scrape-timeout |
LITELLM_PULSE_SCRAPE_TIMEOUT |
30 |
Request timeout in seconds |
--log-level |
LITELLM_PULSE_LOG_LEVEL |
info |
Log level (debug, info, warning, error) |
--timezone |
LITELLM_PULSE_TIMEZONE |
UTC |
Timezone for API timestamps and day/week/month boundaries (IANA name, e.g. America/New_York) |
--metrics-api-key |
LITELLM_PULSE_METRICS_API_KEY |
(empty) | LiteLLM API key for authenticated /metrics endpoints. Only needed if your LiteLLM proxy has require_auth_for_metrics_endpoint set to true. |
When to use
LITELLM_PULSE_METRICS_API_KEY/--metrics-api-key: If your LiteLLM proxy config includesrequire_auth_for_metrics_endpoint: trueunderlitellm_settings, the/metricsendpoint requires authentication via aBearertoken. SetLITELLM_PULSE_METRICS_API_KEY(or pass--metrics-api-key) to a valid LiteLLM API key so LiteLLM Pulse can authenticate. If left empty, noAuthorizationheader is sent — matching the default unauthenticated LiteLLM behavior.
SQLite / Time-Series Settings
| CLI Flag | Env Variable | Default | Description |
|---|---|---|---|
--db-path |
LITELLM_PULSE_DB_PATH |
./data/litellm_pulse.db |
Path to the SQLite database file |
--db-retention-days |
LITELLM_PULSE_DB_RETENTION_DAYS |
90 |
Auto-purge data older than N days (hourly purge cycle) |
--history-size |
LITELLM_PULSE_HISTORY_SIZE |
168 |
Max snapshots in the in-memory ring buffer (used as fallback if DB is unavailable) |
Timezone note: The database always stores timestamps as UTC. The
LITELLM_PULSE_TIMEZONEsetting (or--timezoneflag) only affects API output (timestamps are converted to the configured timezone) and aggregate window boundaries (daily/weekly/monthly resets are computed against the configured timezone's midnight/Monday/1st). Set it to any valid IANA timezone name (e.g.America/New_York,Europe/London). Invalid values fall back to UTC with a warning.
Metric Mappings
Each tracked metric maps a friendly name to a Prometheus metric name. Override any of them by setting the corresponding LITELLM_PULSE_METRIC_* env var.
| Variable | Default |
|---|---|
LITELLM_PULSE_METRIC_REQUESTS |
litellm_proxy_total_requests_metric_total |
LITELLM_PULSE_METRIC_FAILED_REQUESTS |
litellm_proxy_failed_requests_metric_total |
LITELLM_PULSE_METRIC_TOKENS |
litellm_total_tokens_metric_total |
LITELLM_PULSE_METRIC_INPUT_TOKENS |
litellm_input_tokens_metric_total |
LITELLM_PULSE_METRIC_OUTPUT_TOKENS |
litellm_output_tokens_metric_total |
LITELLM_PULSE_METRIC_REASONING_TOKENS |
litellm_output_reasoning_tokens_metric_total |
LITELLM_PULSE_METRIC_COST |
litellm_spend_metric_total |
LITELLM_PULSE_METRIC_IN_FLIGHT_REQUESTS |
litellm_in_flight_requests |
LITELLM_PULSE_METRIC_CACHE_HITS |
litellm_cache_hits_metric_total |
LITELLM_PULSE_METRIC_CACHE_MISSES |
litellm_cache_misses_metric_total |
LITELLM_PULSE_METRIC_CACHED_TOKENS |
litellm_cached_tokens_metric_total |
LITELLM_PULSE_METRIC_INPUT_CACHED_TOKENS |
litellm_input_cached_tokens_metric_total |
LITELLM_PULSE_METRIC_INPUT_CACHE_CREATION_TOKENS |
litellm_input_cache_creation_tokens_metric_total |
API Endpoints
GET / or GET /api/v1/metrics
Returns all tracked metrics: cumulative totals, daily/weekly/monthly aggregates, and metadata.
{
"requests": 1234,
"failed_requests": 5,
"tokens": 567890,
"input_tokens": 300000,
"output_tokens": 267890,
"reasoning_tokens": 0,
"cost": 12.345678,
"in_flight_requests": 2,
"cache_hits": 40,
"cache_misses": 60,
"cached_tokens": 15000,
"input_cached_tokens": 8000,
"input_cache_creation_tokens": 2000,
"requests_daily": 215,
"requests_weekly": 1200,
"requests_monthly": 3400,
"tokens_daily": 45000,
"tokens_weekly": 310000,
"tokens_monthly": 780000,
"cost_daily": 0.02,
"cost_weekly": 0.15,
"cost_monthly": 0.52,
"cache_hits_daily": 5,
"cache_misses_daily": 12,
"cached_tokens_daily": 3000,
"last_scrape": "2025-06-21T12:00:00+00:00",
"source": "http://litellm:4000/metrics/"
}
Every tracked metric gets _daily, _weekly, and _monthly suffixes:
| Suffix | Meaning |
|---|---|
| (none) | Cumulative total since LiteLLM started (raw counter value) |
_daily |
Sum of deltas since start of today (midnight in the configured timezone) |
_weekly |
Sum of deltas since start of this week (Monday in the configured timezone) |
_monthly |
Sum of deltas since start of this month (1st in the configured timezone) |
GET /api/v1/metrics/{name}
Returns a single metric by friendly name. Also supports _daily, _weekly, _monthly suffixes.
GET /api/v1/metrics/cost
GET /api/v1/metrics/cost_daily
GET /api/v1/metrics/tokens_weekly
{
"name": "cost_daily",
"value": 0.02,
"last_scrape": "2025-06-21T12:00:00+00:00"
}
GET /api/v1/history?limit=168
Returns the most recent scrape snapshots as a JSON array (newest last). Draws from SQLite if available, falls back to in-memory ring buffer.
{
"snapshots": [
{
"timestamp": "2025-06-21T12:00:00+00:00",
"is_reset": false,
"requests": 1234,
"requests_delta": 3,
"tokens": 567890,
"tokens_delta": 24500,
"cost": 12.3456,
"cost_delta": 0.0231
}
],
"count": 168,
"source": "sqlite"
}
GET /api/v1/models
Returns per-model breakdown of all metrics that carry a model label in the Prometheus data. Each model includes cumulative totals and daily/weekly/monthly aggregates.
{
"models": [
{
"model": "gpt-4o",
"requests": 800,
"tokens": 40000,
"cost": 2.0,
"deployment_requests": 80,
"requests_daily": 50,
"tokens_daily": 5000,
"cost_daily": 0.15,
"deployment_requests_daily": 5
},
{
"model": "claude-sonnet",
"requests": 200,
"tokens": 10000,
"cost": 0.5,
"requests_daily": 10,
"tokens_daily": 1000,
"cost_daily": 0.02
}
],
"last_scrape": "2025-06-21T12:00:00+00:00"
}
The set of metrics per model depends on what LiteLLM exposes with model labels. Common metrics include requests, tokens, input_tokens, output_tokens, cost, cache_hits, cache_misses, cached_tokens, deployment_requests, deployment_success, and deployment_failures. Metrics without a known friendly name retain their raw Prometheus metric name.
GET /raw
Returns all raw parsed Prometheus metrics (every metric family found, summed). Useful for debugging.
GET /health
Returns {"status": "ok"} once the first successful scrape has completed.
How Deltas & Aggregates Work
LiteLLM's Prometheus metrics are counters — they grow cumulatively and only reset when the LiteLLM process restarts. LiteLLM Pulse handles this as follows:
- Each scrape stores the raw cumulative value and a computed delta (change since the previous scrape).
- Daily/weekly/monthly values are computed as
SUM(delta)for all scrapes within the time window. - Counter reset detection: If the primary
requestscounter drops by more than 50%, LiteLLM Pulse assumes LiteLLM restarted. The delta for that scrape is set to the current value (treating it as starting from 0), andis_reset=trueis recorded in the database. This ensures daily/weekly/monthly sums remain correct even across LiteLLM restarts.
State Recovery
| Scenario | Behavior |
|---|---|
| Fresh start | DB empty → first scrape has no deltas, second scrape onward has valid deltas |
| App restart | Reads last row from DB → restores last-known raw counters (flat + per-model) → seamless continuation |
| LiteLLM restart | Counters drop → reset detected → delta computed from 0, is_reset=1 stored → daily sums remain correct |
| DB corrupted | open_db() catches SQLite errors, starts fresh with a warning log |
| Disk full | Writes fail → error field set in API response → recovers when disk space returns |
| DB schema upgrade | New columns auto-added to existing scrapes table via ALTER TABLE migration |
Integrations
Homepage (Custom API Widget)
Add a service entry in services.yaml with a customapi widget:
- LiteLLM:
icon: https://cdn.jsdelivr.net/gh/selfhst/icons/png/litellm.png
href: https://litellm.home.lan
description: LLM proxy and management
widget:
type: customapi
url: http://litellm-pulse:8000/api/v1/metrics
refreshInterval: 60000
mappings:
- field: requests
label: Total Requests
format: number
- field: cost_daily
label: Spend Today
format: float
prefix: "$"
- field: cost_monthly
label: Spend This Month
format: float
prefix: "$"
- field: tokens_daily
label: Tokens Today
format: number
- field: cache_hits_daily
label: Cache Hits Today
format: number
Home Assistant (REST Sensors)
Add RESTful sensors to configuration.yaml. The rest integration lets you define multiple sensors from a single HTTP request, which avoids polling the LiteLLM Pulse endpoint more than necessary:
rest:
- resource: http://litellm-pulse:8000/api/v1/metrics
scan_interval: 60 # seconds between polls (default: 30)
timeout: 10 # seconds before the sensor is marked unavailable
verify_ssl: true
sensor:
- name: LiteLLM Requests
unique_id: litellm_requests
value_template: "{{ value_json.requests }}"
unit_of_measurement: "req"
device_class: duration
state_class: total_increasing
- name: LiteLLM Tokens
unique_id: litellm_tokens
value_template: "{{ value_json.tokens }}"
unit_of_measurement: "tokens"
state_class: total_increasing
- name: LiteLLM Spend
unique_id: litellm_spend
value_template: "{{ value_json.cost }}"
unit_of_measurement: "USD"
state_class: total_increasing
- name: LiteLLM Spend Today
unique_id: litellm_spend_today
value_template: "{{ value_json.cost_daily }}"
unit_of_measurement: "USD"
state_class: measurement
force_update: true
- name: LiteLLM Spend This Month
unique_id: litellm_spend_this_month
value_template: "{{ value_json.cost_monthly }}"
unit_of_measurement: "USD"
state_class: measurement
force_update: true
- name: LiteLLM Tokens Today
unique_id: litellm_tokens_today
value_template: "{{ value_json.tokens_daily }}"
unit_of_measurement: "tokens"
state_class: measurement
force_update: true
- name: LiteLLM Cache Hits Today
unique_id: litellm_cache_hits_today
value_template: "{{ value_json.cache_hits_daily }}"
unit_of_measurement: "hits"
state_class: measurement
force_update: true
- name: LiteLLM Cached Tokens Today
unique_id: litellm_cached_tokens_today
value_template: "{{ value_json.cached_tokens_daily }}"
unit_of_measurement: "tokens"
state_class: measurement
force_update: true
If you only need a single metric, you can use the sensor.rest platform instead, which polls the endpoint once per sensor:
sensor:
- platform: rest
resource: http://litellm-pulse:8000/api/v1/metrics/cost_daily
name: LiteLLM Spend Today
unique_id: litellm_spend_today
value_template: "{{ value_json.value }}"
unit_of_measurement: "USD"
state_class: measurement
force_update: true
Tip: To refresh a sensor on demand (outside the polling schedule), call the
homeassistant.update_entityaction targeting the sensor entity.
Contributing
Contributions are welcome! Please read the guidelines below before opening a pull request.
Pull Request Process
- Fork the repository and create a feature branch from
main - Run
uv run pre-commit installto set up local git hooks - Make your changes, ensuring
pre-commit run --all-filespasses - Add or update tests as appropriate
- Open a pull request targeting the
mainbranch
Conventional Commits
Pull request titles must follow the Conventional Commits specification. This is enforced by branch protection rules and is required for the release automation to work correctly.
The format is:
<type>(<scope>): <description>
Allowed Types
| Type | Description |
|---|---|
feat |
A new feature |
fix |
A bug fix |
docs |
Documentation only changes |
style |
Changes that do not affect the meaning of the code (formatting, etc.) |
refactor |
A code change that neither fixes a bug nor adds a feature |
perf |
A code change that improves performance |
test |
Adding or correcting tests |
ci |
Changes to CI configuration files and scripts |
chore |
Other changes that don't modify src or test files |
build |
Changes that affect the build system or dependencies |
Examples
feat: add Prometheus push gateway supportfix(db): handle negative deltas on counter resetdocs: update Home Assistant integration examplesci: add Python 3.13 to test matrixrefactor(parser): simplify metric extraction logic
Scopes (optional)
Common scopes: parser, db, app, ci, docker, deps
Releases
Releases are managed automatically by release-please using the manifest-driven approach. Configuration lives in .github/release-please-config.json (stable) and .github/release-please-config-prerelease.json (prerelease); version tracking in .github/.release-please-manifest.json (stable) and .github/.release-please-manifest-prerelease.json (prerelease). The two manifests are independent so cutting a stable release doesn't reset prerelease version tracking, and vice versa.
Single-branch model: everything lives on main. The Release workflow selects the appropriate config based on the trigger:
- Push to
main— runs release-please with the prerelease config → creates prerelease PRs (v0.3.0.dev0,v0.3.0.dev1, …), Docker tagdev, PyPI prerelease packages. - Manual
workflow_dispatchwithstable_release: true— runs release-please with the stable config → creates a stable release PR (v0.3.0), Docker tagslatest/major/major.minor/version, PyPI stable packages.
Workflow:
- PRs are merged to
mainwith conventional commit titles - release-please maintains a "release PR" that accumulates changes (prerelease by default)
- When the prerelease PR is merged, a GitHub prerelease is created, Docker image is pushed with
devtag, and the package is published to PyPI as a prerelease - To cut a stable release, manually trigger the workflow with
stable_release: true— release-please calculates the next stable version from the last stable tag and creates a release PR - When the stable release PR is merged, images are tagged with semantic version (e.g.,
0.3.0), major/minor aliases (e.g.,0.3,0), andlatest
Setup
make venv # sync deps + install pre-commit hooks
# or: uv sync --all-extras --all-groups --frozen && uv run pre-commit install
Running
uv run litellm-pulse # run the server locally
Linting & Formatting
Linting and formatting are enforced via pre-commit with ruff:
uv run pre-commit install # install git hooks (run once)
uv run pre-commit run --all-files # run all checks manually
This runs ruff check --fix and ruff format across the codebase. The same checks run in CI on every push and pull request.
Testing
uv run pytest -v # run tests
# or: make tests # runs pytest
# or: make coverage # serve HTML coverage report at http://localhost:8080
Run
make helpto see all available targets.
CI/CD
| Workflow | Trigger | What it does |
|---|---|---|
| CI (ci.yml) | Push to main, PRs |
Runs pre-commit (ruff lint + format) and pytest on Python 3.11 & 3.12 |
| Release (release.yml) | Push to main / workflow_dispatch |
push → prerelease config (dev tags, PyPI prerelease). workflow_dispatch with stable_release: true → stable config (semantic versioned Docker tags, PyPI stable). See Releases. |
License
MIT — see LICENSE.
Disclaimer
LiteLLM Pulse is an independent, community-developed project created to provide monitoring and analytics for LiteLLM deployments.
This project is not affiliated with, endorsed by, sponsored by, or maintained by LiteLLM or Berri AI.
"LiteLLM" and any associated trademarks, service marks, logos, or trade names are the property of their respective owners and are used here solely to identify compatibility with the LiteLLM ecosystem.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file litellm_pulse-0.3.0.dev1.tar.gz.
File metadata
- Download URL: litellm_pulse-0.3.0.dev1.tar.gz
- Upload date:
- Size: 93.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e8972b4fede5d70e1fea8c1cdd3ff99ed26fa6cb388273dd022364a81c126fce
|
|
| MD5 |
a905cee377d8703e9997227f6185b4aa
|
|
| BLAKE2b-256 |
9d954eea66e84e8a39994ffc4c38740fbb36bf5e3090adfbee0d57ec5ee14d74
|
File details
Details for the file litellm_pulse-0.3.0.dev1-py3-none-any.whl.
File metadata
- Download URL: litellm_pulse-0.3.0.dev1-py3-none-any.whl
- Upload date:
- Size: 21.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
10915316d8ff963ede4f67b08aa52017d246612eccebdc8ad8b1a482bd251f18
|
|
| MD5 |
e2e9913a688453a9d46a0caa23b87019
|
|
| BLAKE2b-256 |
3bc299340c19785546f4bf9ca68fc18f31807a4a1f119e6fa62f356d75b2fad3
|