Skip to main content

Operational Prometheus/OpenTelemetry metrics for discord.py bots, in one line.

Project description

argus-dpy

CI PyPI Python License: AGPL-3.0-or-later

Operational Prometheus / OpenTelemetry metrics for discord.py bots, in one line.

import discord
from discord.ext import commands
from argus import Argus

bot = commands.AutoShardedBot(command_prefix="!", intents=discord.Intents.default())
Argus(bot)          # the whole integration

Argus(bot) instruments shard latency, interaction/command throughput and outcomes, precise command duration, gateway throughput, rate-limit pressure and cache sizes, then serves a Prometheus /metrics endpoint and a live web dashboard on the bot's own event loop. It can also push to OpenTelemetry and drain per-guild events to ClickHouse. It never puts a guild, user, or channel id on a Prometheus label.

Install

pip install argus-dpy

Python 3.10+, discord.py >= 2.4. Optional extras: argus-dpy[otlp] (OpenTelemetry push), argus-dpy[clickhouse] (per-guild analytics), argus-dpy[fleet] (.env autoload for the control plane). A reference container is published at ghcr.io/astoristhebrave/argus, and the Fleet control plane at ghcr.io/astoristhebrave/argus-fleet.

Compatibility. Argus targets upstream discord.py 2.x and uses its asynchronous cog lifecycle (await bot.add_cog, async cog_load/cog_unload) and setup_hook chaining. Forks that vendor the discord namespace and follow the same async-cog semantics may work but are untested; Pycord differs (a synchronous add_cog and a non-coroutine cog_unload) and is not supported unmodified. Because every fork ships the same discord import name, only one can be installed at a time, and pip install argus-dpy pulls upstream discord.py. See Compatibility.

New here? Follow a tutorial end to end: Single bot or Fleet at scale.

Behaviour

Argus(bot) registers listeners synchronously, then starts an aiohttp server on the bot's loop once it is running. By default it serves the dashboard at / and metrics at /metrics on port 9191. Disable the dashboard with Argus(bot, dashboard=False); everything else is opt-in. Instrumentation is fail-open everywhere: event hooks, scrape-time gauges, and even the metrics server failing to bind are counted and swallowed, never raised into your bot. The argus_subsystem_up gauge reports Argus' own health so you can alert when it degrades while the bot stays up. See Architecture & invariants.

How Argus works

discord.py events flow through O(1), fail-open hooks into one backend-neutral metric registry. Adapters and the HTTP server read from that registry; the core never imports an adapter, so backends attach and detach without touching collection. Gauges are read live at scrape time (no background poller). An optional, separate analytical path drains per-guild events to ClickHouse and is never a Prometheus label.

flowchart TD
    bot[discord.py bot] -->|events, state| hooks[core hooks and instrumentation]
    hooks -->|inc / observe / set_info| reg[(MetricRegistry, backend-neutral)]
    reg --> prom[Prometheus adapter]
    reg --> otlp[OTLP adapter, optional]
    hooks -.->|per-guild events| sink[history sink, optional]
    sink --> ch[(ClickHouse)]
    prom --> exp[aiohttp server]
    exp --> m[GET /metrics]
    exp --> dash[dashboard SPA and /api]
    prom -.->|snapshot| member[fleet client, optional]
    member -.->|register, heartbeat| fleet[Fleet control plane]

A bot opts into more by adding kwargs or ARGUS_* env vars; with none, only the metrics endpoint and dashboard run. For many processes across regions, the opt-in Fleet control plane aggregates them into one view.

Minimal setup

The minimum is one line; everything else is opt-in via kwargs or ARGUS_* environment variables (kwargs override env override defaults).

Argus(bot)   # metrics at /metrics, dashboard at /, on port 9191

To protect the dashboard, set one env var on the host that runs the bot — Argus picks it up automatically. The dashboard is served by Argus in the same process, so there is nothing separate to host or wire up:

ARGUS_DASHBOARD_AUTH_TOKEN=your-secret   # gates / and /api/*; /metrics stays scrapeable

Open the dashboard once with the token and it is remembered in the browser: http://your-host:9191/?token=your-secret.

Common options

kwarg / env default meaning
port / ARGUS_PORT 9191 server port (falls back to SERVER_PORT/PORT injected by Pterodactyl/PebbleHost/Railway)
dashboard_auth_token / ARGUS_DASHBOARD_AUTH_TOKEN gate the dashboard + APIs
metrics_auth_token / ARGUS_METRICS_AUTH_TOKEN require a bearer token to scrape /metrics (shared-host public binds)
grafana_url / ARGUS_GRAFANA_URL link/embed your Grafana boards
cluster_id / ARGUS_CLUSTER_ID default label for clustered deploys
enable_per_guild / ARGUS_ENABLE_PER_GUILD false per-guild analytics path
otlp_endpoint / ARGUS_OTLP_ENDPOINT also push metrics via OTLP
log_format / ARGUS_LOG_FORMAT text set json for structured logs on the argus logger

Every option, precedence and parsing rule is in Configuration. New here? Start with the FAQ.

Metrics

Aggregate, bounded-cardinality metrics: per-shard latency and up state, per-cluster guild/user/voice/emoji/sticker/channel counts, uptime, registered commands, interaction and command rates with success/error split, precise app- and prefix-command duration histograms, gateway throughput, shard dis/reconnects, log and rate-limit counters. Every counter and histogram carry a cluster label. Argus also reports its own health: argus_up, argus_subsystem_up{subsystem} (server/fleet/sink), and counters for swallowed instrumentation errors and dropped analytical events.

Full list with labels: Metrics Reference.

Dashboard

A React SPA bundled into the wheel, served at /: overview, interactions, gateway, your Grafana boards, and per-guild analytics. Reads metrics live over SSE with a polling fallback. Set dashboard_auth_token for anything public. See Dashboard.

Per-guild analytics

Per-guild, per-user questions never go to Prometheus (cardinality). With enable_per_guild + clickhouse_dsn (the argus-dpy[clickhouse] extra), Argus drains per-guild events to ClickHouse (batched, non-blocking) and the dashboard's Analytics section serves per-guild command counts and average durations. Step-by-step: Per-guild analytics tutorial; internals: History & ClickHouse.

Grafana, OTLP, clustering

docker compose up -d brings up a provisioned Prometheus + Grafana with four dashboards (overview, interactions, gateway, and an Argus self-health board) plus recording and alerting rules you can tune. Set otlp_endpoint (the argus-dpy[otlp] extra) to also push via OpenTelemetry to Datadog, Grafana Cloud, Honeycomb, and the like. Run one Argus per process with a distinct cluster_id for clustered bots. See the OTLP tutorial, Clustering, and OTLP internals.

No inbound port? Push instead. OTLP, a Prometheus Pushgateway (pushgateway_url), and the Fleet client are all outbound-only, so they work where you can't expose /metrics at all — Docker bot panels (Pterodactyl, PebbleHost, Railway). See hosting / Hosting on bot panels.

Fleet control plane (opt-in)

Running many bot processes across regions? The Argus Fleet control plane is a separate, opt-in service that aggregates them into one readable, multi-tier view: Global (everything) -> Fleet (a region, e.g. asia) -> Cluster (one process) -> Shard (per-shard up/latency). It renders plain, colour-graded panels with no PromQL or Grafana setup, and reads from two interchangeable sources: a self-contained push path (zero infra; members heartbeat to it) and an existing Prometheus.

Bots are unchanged unless they opt in. The fastest path is the setup wizard, which mints a token and writes a ready .env + docker-compose.fleet.yml and prints the exact member snippet:

python -m argus.fleet init        # scaffold; then: docker compose -f docker-compose.fleet.yml up -d
python -m argus.fleet doctor --url http://fleet-host:9190 --token secret   # diagnose

Or wire it by hand:

# the control plane (its own process / container)
ARGUS_FLEET_TOKEN=secret python -m argus.fleet          # serves :9190

# each bot opts in with a few env vars (or kwargs)
ARGUS_FLEET_URL=http://fleet-host:9190 \
ARGUS_FLEET_TOKEN=secret ARGUS_FLEET_GROUP=asia \
    python bot.py

Point it at the shared ClickHouse (ARGUS_FLEET_CLICKHOUSE_DSN) and the same pane gains a per-guild Analytics view (fleet-wide, or sliced to one bot) — so one dashboard covers operational rollups and analytics.

Secure by default: a non-loopback bind with no token refuses to start; set a token (or ARGUS_FLEET_TOKEN_FILE). It assigns each process a stable per-region number (never reused; a dead cluster keeps its slot, shown down), persists topology across restarts, caps request bodies, strips its version banner, and exposes its own /metrics and /readyz. The member side is fail-open: a fleet outage never touches your bot loop. Full guide and deployment: Fleet and the Fleet tutorial.

Why no per-guild Prometheus labels?

guild_id/user_id/channel_id are unbounded; as labels they explode Prometheus at scale and are useless to visualise. Argus forbids them by construction and routes per-entity questions to the analytical path instead.

Security

Set dashboard_auth_token for any non-localhost bot; the fleet control plane refuses to start on a public bind without a token and is hardened by default (rate limits, body caps, security headers, non-root images, SBOM/provenance). The same security headers, body cap, and banner strip apply to the in-process bot server too. The no-PII-label guarantee means per-entity data never reaches Prometheus. CI runs CodeQL and a pip-audit dependency audit, and each release ships a wheel SBOM. Full guidance: Security and the threat model. Report vulnerabilities privately via SECURITY.md.

Examples

Runnable examples in examples/ (see examples/README.md for the index + a production dos-and-don'ts):

Using a coding agent to get started? Point it at llms.txt — a machine-readable map (including how to clone the wiki for the in-depth guides).

Contributing & license

Contributions are accepted under the DCO; see CONTRIBUTING.md. Licensed under AGPL-3.0-or-later (network use counts as distribution) — see LICENSE. Release notes: CHANGELOG.md / Releases.


See the full wiki for the in-depth guides and explanations.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

argus_dpy-0.4.2.tar.gz (296.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

argus_dpy-0.4.2-py3-none-any.whl (299.7 kB view details)

Uploaded Python 3

File details

Details for the file argus_dpy-0.4.2.tar.gz.

File metadata

  • Download URL: argus_dpy-0.4.2.tar.gz
  • Upload date:
  • Size: 296.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for argus_dpy-0.4.2.tar.gz
Algorithm Hash digest
SHA256 0e103ea1f99b71942ff66f64c38a1d988584d262cc3ccc4e3de05f69386692a9
MD5 9e5afc114b2143b20d04ed62a0cc1712
BLAKE2b-256 3543a8bf75bb50f48bca3a9b900f835b8391652128fad41c054dad7a5b2979c6

See more details on using hashes here.

Provenance

The following attestation bundles were made for argus_dpy-0.4.2.tar.gz:

Publisher: release-please.yml on AstorisTheBrave/argus

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file argus_dpy-0.4.2-py3-none-any.whl.

File metadata

  • Download URL: argus_dpy-0.4.2-py3-none-any.whl
  • Upload date:
  • Size: 299.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for argus_dpy-0.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c572a042cef6d8249cba22049e751b59e3edca3fcc341bc407bd71bfdebd150c
MD5 9ed643b72a18f5adf56183859b35dc6f
BLAKE2b-256 816b71486e5edfd14e381b17d2fc660614d26949c428b0374803f8aeb1ffcea5

See more details on using hashes here.

Provenance

The following attestation bundles were made for argus_dpy-0.4.2-py3-none-any.whl:

Publisher: release-please.yml on AstorisTheBrave/argus

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page