Skip to main content

Read-only AWS reliability audit. Alarm coverage assessment for ECS, Lambda, RDS, Aurora, and SQS.

Project description

opsfabric-discovery

PyPI License: MIT Python Downloads

Open-source AWS reliability audit. Produces a 3-page executive PDF assessing CloudWatch alarm coverage across ECS, Lambda, RDS, Aurora, and SQS workloads. Runs locally — your data never leaves your laptop.

DiscoveryFabric is the audit layer of the OpsFabric reliability platform. The closed-source companion products — AlarmFabric (alarm remediation) and OpsFabric (incident response orchestration) — turn the audit findings into production-fixing automations. See docs/comparison.md for the full feature matrix.

See what your audit would look like — no AWS needed

Download a sample report (PDF, ~68 KB)

Or run it yourself in 30 seconds:

pip install opsfabric-discovery
opsfabric-discovery audit --demo
# → out/audit-demo.pdf

--demo runs against a baked-in synthetic dataset that exercises every feature of the audit (DEGRADED alarm detection, ALB→ECS bridge, critical-gap cards, coverage breakdown). No AWS calls, no credentials needed. Same matching engine, same PDF — only the input is fake.

What it does

  • Discovers AWS resources via Resource Explorer 2 across one or all enabled regions.
  • Maps CloudWatch alarms to those resources using a five-strategy matcher (exact dimensions, ALB target-group bridge for ECS, namespace + partial dimensions, log-group → metric-filter linkage, naming heuristic).
  • Detects alarms that exist but won't notify (actions disabled / no SNS target / INSUFFICIENT_DATA) and surfaces them as DEGRADED — they don't count toward coverage.
  • Scores required-check coverage against the OpsFabric reliability baseline.
  • Renders an executive PDF (3 pages) plus JSON appendices.

Trust statement

  • Read-only. Calls only AWS describe / list APIs. Never creates, modifies, or deletes any resource.
  • Runs on your laptop. No telemetry, no phone-home. Your data never leaves your machine.
  • Source is auditable. Open the installed Python files — every AWS call is visible in discovery_fabric/aws/.
  • Minimum IAM permissions (read-only across the board): sts:GetCallerIdentity, ec2:DescribeRegions, resource-explorer-2:ListViews / GetView / Search, tag:GetResources, cloudwatch:DescribeAlarms, logs:DescribeLogGroups / DescribeMetricFilters, ecs:ListClusters / ListServices / DescribeServices / DescribeTaskDefinition, lambda:ListFunctions / GetFunction, rds:DescribeDBInstances / DescribeDBClusters, sqs:ListQueues / GetQueueAttributes.

Install

pip install opsfabric-discovery
opsfabric-discovery --help

Quickstart

Once installed, from any directory:

# Audit a profile from ~/.aws/credentials
opsfabric-discovery audit --profile prod --regions all --account-alias acme-prod

# Or via STS assume-role (cross-account)
opsfabric-discovery audit \
  --assume-role-arn arn:aws:iam::CUSTOMER_ACCOUNT:role/OpsFabricAuditor \
  --external-id agreed-secret \
  --regions all \
  --account-alias acme-prod

# Show OpsFabric product context (DiscoveryFabric / AlarmFabric / OpsFabric)
opsfabric-discovery --about

# Outputs land in ./out/ by default; override with --output-dir
ls out/
# audit-<account-id>-<YYYYMMDD>.pdf
# alarm-coverage-score.json
# alarm-coverage-missing.json
# resource-mapping.json
# all-resources.json
# audit-meta.json

Closing the gaps

DiscoveryFabric tells you what's missing. To close the gaps, you have two paths:

  1. DIY — open the audit PDF, click through to the AWS Console, author each missing CloudWatch alarm by hand. Free, but tedious; a mid-market fleet usually has 30–200 missing alarms.
  2. AlarmFabric — the OpsFabric remediation product. Reads this audit's JSON output, generates the alarms in your account via the same read-only role used for the audit (plus cloudwatch:PutMetricAlarm), and tags each alarm with its provenance for easy rollback. Typical turnaround: under one engineering day for a fleet of any size. Closed-source SaaS — opsfabric.ai or email founders@opsfabric.ai.

Once the alarms start firing, OpsFabric handles the incident lifecycle: triage from Slack/CloudWatch/Jira, automated RCA, remediation suggestions, Confluence post-mortems, ticket close-out. Also closed-source SaaS.

The OSS audit is genuinely useful on its own. The paid products are a different layer — not a crippled version of the audit, just a different part of the reliability loop.

Open source vs commercial — feature matrix

Capability DiscoveryFabric (OSS) AlarmFabric (paid) OpsFabric (paid)
Read-only audit
Resource discovery (ECS / Lambda / RDS / Aurora / SQS)
Five-strategy alarm matching
DEGRADED alarm detection
Executive PDF + JSON output
--demo synthetic walkthrough
Create missing alarms in your account
Tagged + reversible alarm provenance
SNS / PagerDuty / Opsgenie wiring
Scheduled / continuous audits
Slack-based incident triage
Automated RCA + remediation suggestions
Jira / Confluence incident lifecycle
Multi-tenant managed SaaS
Pricing Free (MIT) opsfabric.ai opsfabric.ai

See docs/comparison.md for the full version with one-paragraph explanations of each row.

About OpsFabric

We build reliability automation for AWS-heavy mid-market teams. DiscoveryFabric is open-source under the MIT license because the audit should be free — we make money on the remediation and incident-response automation. The OSS funnel and the SaaS funnel feed each other: a team runs the audit, sees their coverage is below baseline, and decides whether to fix the gaps themselves or have us do it.

Contributing

PRs welcome — see CONTRIBUTING.md for dev setup, test conventions, and what's in scope (resource types, matching strategies, output polish) vs out of scope (live alarm creation, runtime incident handling — those belong to the commercial products).

By participating, you agree to our Code of Conduct.

Support

Question type Where to go
Bug in the OSS audit tool GitHub issues — use the Bug Report template
Feature idea for the OSS audit GitHub issues — use the Feature Request template
Security issue Email founders@opsfabric.ai (SECURITY.md)
Commercial AlarmFabric / OpsFabric questions Email founders@opsfabric.ai

License

MIT — © 2026 Vaishal Shah / OpsFabric.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

opsfabric_discovery-0.3.0.tar.gz (193.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

opsfabric_discovery-0.3.0-py3-none-any.whl (55.1 kB view details)

Uploaded Python 3

File details

Details for the file opsfabric_discovery-0.3.0.tar.gz.

File metadata

  • Download URL: opsfabric_discovery-0.3.0.tar.gz
  • Upload date:
  • Size: 193.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.7

File hashes

Hashes for opsfabric_discovery-0.3.0.tar.gz
Algorithm Hash digest
SHA256 b9bf5318a7cc6d96ec6e8073b90890b176511e0e67b0e4bb31197cadee2df6a0
MD5 b65fb6c44f8b4615b40462ab3b9d7a88
BLAKE2b-256 d557d8f90dde45b0a276708acae711244c6c09ad59266f35713a20acd1b3ec7e

See more details on using hashes here.

File details

Details for the file opsfabric_discovery-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for opsfabric_discovery-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3be76278fc45087d7a09525c5cd463182e50e7c4c527a5eb925cda961a9f274b
MD5 9df32dec0024f30bfcbc0ff722b6150e
BLAKE2b-256 6d5b15e3501548651a3fe36363011c7eabbbdf14b9675504705ef563010e5ac2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page