Operational Prometheus/OpenTelemetry metrics for discord.py bots, in one line.
Project description
argus-dpy
Operational Prometheus / OpenTelemetry metrics for discord.py bots, in one line.
import discord
from discord.ext import commands
from argus import Argus
bot = commands.AutoShardedBot(command_prefix="!", intents=discord.Intents.default())
Argus(bot) # the whole integration
Argus(bot) instruments shard latency, interaction/command throughput and
outcomes, precise command duration, gateway throughput, rate-limit pressure and
cache sizes, then serves a Prometheus /metrics endpoint and a live web
dashboard on the bot's own event loop. It can also push to OpenTelemetry and
drain per-guild events to ClickHouse. It never puts a guild, user, or channel id
on a Prometheus label.
Install
pip install argus-dpy
Python 3.10+, discord.py >= 2.4. Optional extras: argus-dpy[otlp]
(OpenTelemetry push), argus-dpy[clickhouse] (per-guild analytics),
argus-dpy[fleet] (.env autoload for the control plane). A reference container is published at
ghcr.io/astoristhebrave/argus, and the Fleet control plane
at ghcr.io/astoristhebrave/argus-fleet.
Compatibility. Argus targets upstream discord.py 2.x and uses its
asynchronous cog lifecycle (await bot.add_cog, async cog_load/cog_unload)
and setup_hook chaining. Forks that vendor the discord namespace and follow
the same async-cog semantics may work but are untested; Pycord differs (a
synchronous add_cog and a non-coroutine cog_unload) and is not supported
unmodified. Because every fork ships the same discord import name, only one can
be installed at a time, and pip install argus-dpy pulls upstream discord.py.
See Compatibility.
New here? Follow a tutorial end to end: Single bot or Fleet at scale.
Behaviour
Argus(bot) registers listeners synchronously, then starts an aiohttp server on
the bot's loop once it is running. By default it serves the dashboard at /
and metrics at /metrics on port 9191. Disable the dashboard with
Argus(bot, dashboard=False); everything else is opt-in. Instrumentation is
fail-open everywhere: event hooks, scrape-time gauges, and even the metrics
server failing to bind are counted and swallowed, never raised into your bot. The
argus_subsystem_up gauge reports Argus' own health so you can alert when it
degrades while the bot stays up.
See Architecture & invariants.
How Argus works
discord.py events flow through O(1), fail-open hooks into one backend-neutral metric registry. Adapters and the HTTP server read from that registry; the core never imports an adapter, so backends attach and detach without touching collection. Gauges are read live at scrape time (no background poller). An optional, separate analytical path drains per-guild events to ClickHouse and is never a Prometheus label.
flowchart TD
bot[discord.py bot] -->|events, state| hooks[core hooks and instrumentation]
hooks -->|inc / observe / set_info| reg[(MetricRegistry, backend-neutral)]
reg --> prom[Prometheus adapter]
reg --> otlp[OTLP adapter, optional]
hooks -.->|per-guild events| sink[history sink, optional]
sink --> ch[(ClickHouse)]
prom --> exp[aiohttp server]
exp --> m[GET /metrics]
exp --> dash[dashboard SPA and /api]
prom -.->|snapshot| member[fleet client, optional]
member -.->|register, heartbeat| fleet[Fleet control plane]
A bot opts into more by adding kwargs or ARGUS_* env vars; with none, only the
metrics endpoint and dashboard run. For many processes across regions, the
opt-in Fleet control plane aggregates them into
one view.
Minimal setup
The minimum is one line; everything else is opt-in via kwargs or ARGUS_*
environment variables (kwargs override env override defaults).
Argus(bot) # metrics at /metrics, dashboard at /, on port 9191
To protect the dashboard, set one env var on the host that runs the bot — Argus picks it up automatically. The dashboard is served by Argus in the same process, so there is nothing separate to host or wire up:
ARGUS_DASHBOARD_AUTH_TOKEN=your-secret # gates / and /api/*; /metrics stays scrapeable
Open the dashboard once with the token and it is remembered in the browser:
http://your-host:9191/?token=your-secret.
Common options
| kwarg / env | default | meaning |
|---|---|---|
port / ARGUS_PORT |
9191 |
server port (falls back to SERVER_PORT/PORT injected by Pterodactyl/PebbleHost/Railway) |
dashboard_auth_token / ARGUS_DASHBOARD_AUTH_TOKEN |
— | gate the dashboard + APIs |
metrics_auth_token / ARGUS_METRICS_AUTH_TOKEN |
— | require a bearer token to scrape /metrics (shared-host public binds) |
grafana_url / ARGUS_GRAFANA_URL |
— | link/embed your Grafana boards |
cluster_id / ARGUS_CLUSTER_ID |
default |
label for clustered deploys |
enable_per_guild / ARGUS_ENABLE_PER_GUILD |
false |
per-guild analytics path |
otlp_endpoint / ARGUS_OTLP_ENDPOINT |
— | also push metrics via OTLP |
log_format / ARGUS_LOG_FORMAT |
text |
set json for structured logs on the argus logger |
Every option, precedence and parsing rule is in Configuration. New here? Start with the FAQ.
Metrics
Aggregate, bounded-cardinality metrics: per-shard latency and up state,
per-cluster guild/user/voice/emoji/sticker/channel counts, uptime, registered
commands, interaction and command rates with success/error split, precise
app- and prefix-command duration histograms, gateway throughput, shard
dis/reconnects, log and rate-limit counters. Every counter and histogram carry a
cluster label. Argus also reports its own health: argus_up,
argus_subsystem_up{subsystem} (server/fleet/sink), and counters for swallowed
instrumentation errors and dropped analytical events.
Full list with labels: Metrics Reference.
Dashboard
A React SPA bundled into the wheel, served at /: overview, interactions,
gateway, your Grafana boards, and per-guild analytics. Reads metrics live over
SSE with a polling fallback. Set dashboard_auth_token for anything public.
See Dashboard.
Per-guild analytics
Per-guild, per-user questions never go to Prometheus (cardinality). With
enable_per_guild + clickhouse_dsn (the argus-dpy[clickhouse] extra), Argus
drains per-guild events to ClickHouse (batched, non-blocking) and the dashboard's
Analytics section serves per-guild command counts and average durations.
Step-by-step: Per-guild analytics tutorial;
internals: History & ClickHouse.
Grafana, OTLP, clustering
docker compose up -d brings up a provisioned Prometheus + Grafana with four
dashboards (overview, interactions, gateway, and an Argus self-health board) plus
recording and alerting rules you can tune. Set otlp_endpoint (the
argus-dpy[otlp] extra) to also push via
OpenTelemetry to Datadog, Grafana Cloud, Honeycomb, and the like. Run one Argus
per process with a distinct cluster_id for clustered bots.
See the OTLP tutorial,
Clustering, and
OTLP internals.
No inbound port? Push instead. OTLP, a Prometheus Pushgateway
(pushgateway_url), and the Fleet client are all outbound-only, so they work
where you can't expose /metrics at all — Docker bot panels (Pterodactyl,
PebbleHost, Railway). See hosting /
Hosting on bot panels.
Fleet control plane (opt-in)
Running many bot processes across regions? The Argus Fleet control plane is a
separate, opt-in service that aggregates them into one readable, multi-tier view:
Global (everything) -> Fleet (a region, e.g. asia) -> Cluster (one
process) -> Shard (per-shard up/latency). It renders plain, colour-graded
panels with no PromQL or Grafana setup,
and reads from two interchangeable sources: a self-contained push path (zero
infra; members heartbeat to it) and an existing Prometheus.
Bots are unchanged unless they opt in. The fastest path is the setup wizard,
which mints a token and writes a ready .env + docker-compose.fleet.yml and
prints the exact member snippet:
python -m argus.fleet init # scaffold; then: docker compose -f docker-compose.fleet.yml up -d
python -m argus.fleet doctor --url http://fleet-host:9190 --token secret # diagnose
Or wire it by hand:
# the control plane (its own process / container)
ARGUS_FLEET_TOKEN=secret python -m argus.fleet # serves :9190
# each bot opts in with a few env vars (or kwargs)
ARGUS_FLEET_URL=http://fleet-host:9190 \
ARGUS_FLEET_TOKEN=secret ARGUS_FLEET_GROUP=asia \
python bot.py
Point it at the shared ClickHouse (ARGUS_FLEET_CLICKHOUSE_DSN) and the same pane
gains a per-guild Analytics view (fleet-wide, or sliced to one bot) — so one
dashboard covers operational rollups and analytics.
Secure by default: a non-loopback bind with no token refuses to start; set a
token (or ARGUS_FLEET_TOKEN_FILE). It assigns each process a stable per-region
number (never reused; a dead cluster keeps its slot, shown down), persists
topology across restarts, caps request bodies, strips its version banner, and
exposes its own /metrics and /readyz. The member side is fail-open: a fleet
outage never touches your bot loop. Full guide and deployment:
Fleet and the
Fleet tutorial.
Why no per-guild Prometheus labels?
guild_id/user_id/channel_id are unbounded; as labels they explode
Prometheus at scale and are useless to visualise. Argus forbids them by
construction and routes per-entity questions to the analytical path instead.
Security
Set dashboard_auth_token for any non-localhost bot; the fleet control plane
refuses to start on a public bind without a token and is hardened by default
(rate limits, body caps, security headers, non-root images, SBOM/provenance). The
same security headers, body cap, and banner strip apply to the in-process bot
server too. The no-PII-label guarantee means per-entity data never reaches
Prometheus. CI runs CodeQL and a pip-audit dependency audit, and each release
ships a wheel SBOM. Full guidance:
Security and the
threat model. Report vulnerabilities privately via
SECURITY.md.
Examples
Runnable examples in examples/ (see examples/README.md
for the index + a production dos-and-don'ts):
basic_bot.py— one bot, one line.production_bot.py— hardened single bot (intents, secrets, auth, logging).clustered_bot.py— one process per shard range.otlp_bot.py— export to an OpenTelemetry collector.analytics_bot.py— per-guild ClickHouse analytics.fleet_member_bot.py— opting into a fleet.config_kwargs.py— every option, as kwargs.k8s/— Kubernetes manifests for a bot and the control plane.hosting/— Docker bot panels (Pterodactyl, PebbleHost, Ori, Railway): egg, start shim, decision tree.
Using a coding agent to get started? Point it at llms.txt — a
machine-readable map (including how to clone the wiki for the in-depth guides).
Contributing & license
Contributions are accepted under the DCO; see CONTRIBUTING.md. Licensed under AGPL-3.0-or-later (network use counts as distribution) — see LICENSE. Release notes: CHANGELOG.md / Releases.
See the full wiki for the in-depth guides and explanations.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file argus_dpy-0.4.2.tar.gz.
File metadata
- Download URL: argus_dpy-0.4.2.tar.gz
- Upload date:
- Size: 296.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0e103ea1f99b71942ff66f64c38a1d988584d262cc3ccc4e3de05f69386692a9
|
|
| MD5 |
9e5afc114b2143b20d04ed62a0cc1712
|
|
| BLAKE2b-256 |
3543a8bf75bb50f48bca3a9b900f835b8391652128fad41c054dad7a5b2979c6
|
Provenance
The following attestation bundles were made for argus_dpy-0.4.2.tar.gz:
Publisher:
release-please.yml on AstorisTheBrave/argus
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
argus_dpy-0.4.2.tar.gz -
Subject digest:
0e103ea1f99b71942ff66f64c38a1d988584d262cc3ccc4e3de05f69386692a9 - Sigstore transparency entry: 1902960934
- Sigstore integration time:
-
Permalink:
AstorisTheBrave/argus@f6d0b243249d9ca43720b9fdfab1b2658a82aaa5 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/AstorisTheBrave
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-please.yml@f6d0b243249d9ca43720b9fdfab1b2658a82aaa5 -
Trigger Event:
push
-
Statement type:
File details
Details for the file argus_dpy-0.4.2-py3-none-any.whl.
File metadata
- Download URL: argus_dpy-0.4.2-py3-none-any.whl
- Upload date:
- Size: 299.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c572a042cef6d8249cba22049e751b59e3edca3fcc341bc407bd71bfdebd150c
|
|
| MD5 |
9ed643b72a18f5adf56183859b35dc6f
|
|
| BLAKE2b-256 |
816b71486e5edfd14e381b17d2fc660614d26949c428b0374803f8aeb1ffcea5
|
Provenance
The following attestation bundles were made for argus_dpy-0.4.2-py3-none-any.whl:
Publisher:
release-please.yml on AstorisTheBrave/argus
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
argus_dpy-0.4.2-py3-none-any.whl -
Subject digest:
c572a042cef6d8249cba22049e751b59e3edca3fcc341bc407bd71bfdebd150c - Sigstore transparency entry: 1902961050
- Sigstore integration time:
-
Permalink:
AstorisTheBrave/argus@f6d0b243249d9ca43720b9fdfab1b2658a82aaa5 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/AstorisTheBrave
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-please.yml@f6d0b243249d9ca43720b9fdfab1b2658a82aaa5 -
Trigger Event:
push
-
Statement type: