Skip to main content

Read-only AWS reliability audit. Alarm coverage assessment for ECS, Lambda, RDS, Aurora, and SQS.

Project description

opsfabric-discovery

A read-only AWS reliability audit you run on your own laptop. Produces an executive PDF assessing CloudWatch alarm coverage across ECS, Lambda, RDS, Aurora, and SQS workloads against the OpsFabric reliability baseline.

See what your audit would look like (no AWS needed)

Download a sample report (PDF, ~68 KB)

Or run it yourself in 30 seconds without any AWS credentials:

pip install opsfabric-discovery        # (locally — see install section below)
opsfabric-discovery audit --demo
# → out/audit-demo.pdf

--demo runs against a baked-in synthetic dataset that exercises every feature of the audit (DEGRADED alarm detection, ALB→ECS bridge, critical-gap cards, coverage breakdown). No AWS calls, no credentials needed. Same matching engine, same PDF — only the input is fake.

What it does

  • Discovers AWS resources via Resource Explorer 2 across one or all enabled regions.
  • Maps CloudWatch alarms to those resources using a five-strategy matcher (exact dimensions, ALB target-group bridge for ECS, namespace + partial dimensions, log-group → metric-filter linkage, naming heuristic).
  • Detects alarms that exist but won't notify (actions disabled / no SNS target / INSUFFICIENT_DATA) and surfaces them as DEGRADED — they don't count toward coverage.
  • Scores required-check coverage against a baseline pack (discovery_fabric/data/alarm_pack.yaml).
  • Renders an executive PDF (3 pages, McKinsey-style) plus JSON appendices for every artifact.

Trust statement

  • Read-only. Calls only AWS describe / list APIs. Never creates, modifies, or deletes any resource.
  • Runs on your laptop. No telemetry, no phone-home. Your data never leaves your machine.
  • Source is auditable. Open this directory's Python files — every AWS call is visible.
  • Customer IAM policy for cross-account audits lives at discovery-fabric/docs/customer-iam-policy.json (in the parent monorepo for stage 1; bundled inside the package in stage 2).

Install

pip install opsfabric-discovery
opsfabric-discovery --help

Quickstart

Once installed, from any directory:

# Audit a profile from ~/.aws/credentials
opsfabric-discovery audit --profile prod --regions all --account-alias acme-prod

# Or via STS assume-role (cross-account)
opsfabric-discovery audit \
  --assume-role-arn arn:aws:iam::CUSTOMER_ACCOUNT:role/OpsFabricAuditor \
  --external-id agreed-secret \
  --regions all \
  --account-alias acme-prod

# Outputs land in ./out/ by default; override with --output-dir
ls out/
# audit-<account-id>-<YYYYMMDD>.pdf
# alarm-coverage-score.json
# alarm-coverage-missing.json
# resource-mapping.json
# all-resources.json
# audit-meta.json

How it works (one-paragraph)

For every required check in the alarm pack and every discovered resource, the matcher tries five strategies in priority order. First hit wins:

  1. Exact dimension match — alarm dimensions equal the resource's canonical dimensions (e.g. ClusterName + ServiceName for ECS). HIGH confidence.
  2. ALB target-group bridge — alarm uses TargetGroup dimension; we cross-reference back to the ECS service via its registered load-balancer attachments. Exact-ARN equality. HIGH.
  3. Namespace + partial dimension match — alarm is in the resource's expected namespace and at least one dimension matches. MEDIUM.
  4. Metric-filter → log-group linkage — alarm metric was published by a metric filter on one of the resource's log groups. HIGH.
  5. Naming heuristic — resource name appears as substring in alarm name. LOW (last-resort).

Per-region scoping prevents cross-region false positives. Per-region failures (RE2 not enabled, IAM gap, throttling beyond retry) are skipped + logged rather than aborting the audit.

Re-syncing from the monorepo source

When the monorepo at discovery-fabric/ changes, run:

./bin/sync-from-monorepo.sh

This re-copies the package files, re-applies the two hand-edits (alarm_pack path; PDF CTA copy), and re-creates cli.py from the latest main.py. The script fails loudly if the source files have drifted in ways that break the patches — fix-up is then manual.

What's NOT in this build

  • The Streamlit UI (lives in the monorepo only — internal tool).
  • The Next.js dashboard (lives in discovery-fabric-ui/ — internal tool).
  • AlarmFabric integration (separate product, closed-source).

The customer-facing surface for this stage is the CLI + the PDF.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

opsfabric_discovery-0.2.0.tar.gz (157.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

opsfabric_discovery-0.2.0-py3-none-any.whl (53.3 kB view details)

Uploaded Python 3

File details

Details for the file opsfabric_discovery-0.2.0.tar.gz.

File metadata

  • Download URL: opsfabric_discovery-0.2.0.tar.gz
  • Upload date:
  • Size: 157.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.7

File hashes

Hashes for opsfabric_discovery-0.2.0.tar.gz
Algorithm Hash digest
SHA256 a6b0f8e76cdf56b37e9fec12dbb75e9a016c588dd358ea97f2b7391a072a3ed8
MD5 61debb3d897a88571a4f6d5f68090708
BLAKE2b-256 b75b74b4772b1a37f5b6764321d6dfcd91c72d93235410f8664a65e47494dee1

See more details on using hashes here.

File details

Details for the file opsfabric_discovery-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for opsfabric_discovery-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 08cab530cddce02a8aef3f8b3a61483357d1be8eefb18d9c3896d3ab66926c37
MD5 65f104fb975e90a0d598facdc66f7b23
BLAKE2b-256 298232bcffcdacb3cce6ba4f390bfc22f911769e937e486ee51343da63f789aa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page