Read-only AWS reliability audit. Alarm coverage assessment for ECS, Lambda, RDS, Aurora, and SQS.
Project description
opsfabric-discovery
A read-only AWS reliability audit you run on your own laptop. Produces an executive PDF assessing CloudWatch alarm coverage across ECS, Lambda, RDS, Aurora, and SQS workloads against the OpsFabric reliability baseline.
See what your audit would look like (no AWS needed)
Download a sample report (PDF, ~68 KB)
Or run it yourself in 30 seconds without any AWS credentials:
pip install opsfabric-discovery # (locally — see install section below)
opsfabric-discovery audit --demo
# → out/audit-demo.pdf
--demo runs against a baked-in synthetic dataset that exercises every feature of the audit (DEGRADED alarm detection, ALB→ECS bridge, critical-gap cards, coverage breakdown). No AWS calls, no credentials needed. Same matching engine, same PDF — only the input is fake.
What it does
- Discovers AWS resources via Resource Explorer 2 across one or all enabled regions.
- Maps CloudWatch alarms to those resources using a five-strategy matcher (exact dimensions, ALB target-group bridge for ECS, namespace + partial dimensions, log-group → metric-filter linkage, naming heuristic).
- Detects alarms that exist but won't notify (actions disabled / no SNS target /
INSUFFICIENT_DATA) and surfaces them as DEGRADED — they don't count toward coverage. - Scores required-check coverage against a baseline pack (
discovery_fabric/data/alarm_pack.yaml). - Renders an executive PDF (3 pages, McKinsey-style) plus JSON appendices for every artifact.
Trust statement
- Read-only. Calls only AWS describe / list APIs. Never creates, modifies, or deletes any resource.
- Runs on your laptop. No telemetry, no phone-home. Your data never leaves your machine.
- Source is auditable. Open this directory's Python files — every AWS call is visible.
- Customer IAM policy for cross-account audits lives at
discovery-fabric/docs/customer-iam-policy.json(in the parent monorepo for stage 1; bundled inside the package in stage 2).
Install
pip install opsfabric-discovery
opsfabric-discovery --help
Quickstart
Once installed, from any directory:
# Audit a profile from ~/.aws/credentials
opsfabric-discovery audit --profile prod --regions all --account-alias acme-prod
# Or via STS assume-role (cross-account)
opsfabric-discovery audit \
--assume-role-arn arn:aws:iam::CUSTOMER_ACCOUNT:role/OpsFabricAuditor \
--external-id agreed-secret \
--regions all \
--account-alias acme-prod
# Outputs land in ./out/ by default; override with --output-dir
ls out/
# audit-<account-id>-<YYYYMMDD>.pdf
# alarm-coverage-score.json
# alarm-coverage-missing.json
# resource-mapping.json
# all-resources.json
# audit-meta.json
How it works (one-paragraph)
For every required check in the alarm pack and every discovered resource, the matcher tries five strategies in priority order. First hit wins:
- Exact dimension match — alarm dimensions equal the resource's canonical dimensions (e.g.
ClusterName + ServiceNamefor ECS). HIGH confidence. - ALB target-group bridge — alarm uses
TargetGroupdimension; we cross-reference back to the ECS service via its registered load-balancer attachments. Exact-ARN equality. HIGH. - Namespace + partial dimension match — alarm is in the resource's expected namespace and at least one dimension matches. MEDIUM.
- Metric-filter → log-group linkage — alarm metric was published by a metric filter on one of the resource's log groups. HIGH.
- Naming heuristic — resource name appears as substring in alarm name. LOW (last-resort).
Per-region scoping prevents cross-region false positives. Per-region failures (RE2 not enabled, IAM gap, throttling beyond retry) are skipped + logged rather than aborting the audit.
Re-syncing from the monorepo source
When the monorepo at discovery-fabric/ changes, run:
./bin/sync-from-monorepo.sh
This re-copies the package files, re-applies the two hand-edits (alarm_pack path; PDF CTA copy), and re-creates cli.py from the latest main.py. The script fails loudly if the source files have drifted in ways that break the patches — fix-up is then manual.
What's NOT in this build
- The Streamlit UI (lives in the monorepo only — internal tool).
- The Next.js dashboard (lives in
discovery-fabric-ui/— internal tool). - AlarmFabric integration (separate product, closed-source).
The customer-facing surface for this stage is the CLI + the PDF.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file opsfabric_discovery-0.2.0.tar.gz.
File metadata
- Download URL: opsfabric_discovery-0.2.0.tar.gz
- Upload date:
- Size: 157.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a6b0f8e76cdf56b37e9fec12dbb75e9a016c588dd358ea97f2b7391a072a3ed8
|
|
| MD5 |
61debb3d897a88571a4f6d5f68090708
|
|
| BLAKE2b-256 |
b75b74b4772b1a37f5b6764321d6dfcd91c72d93235410f8664a65e47494dee1
|
File details
Details for the file opsfabric_discovery-0.2.0-py3-none-any.whl.
File metadata
- Download URL: opsfabric_discovery-0.2.0-py3-none-any.whl
- Upload date:
- Size: 53.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
08cab530cddce02a8aef3f8b3a61483357d1be8eefb18d9c3896d3ab66926c37
|
|
| MD5 |
65f104fb975e90a0d598facdc66f7b23
|
|
| BLAKE2b-256 |
298232bcffcdacb3cce6ba4f390bfc22f911769e937e486ee51343da63f789aa
|