Skip to main content

Read-only AWS reliability audit. Alarm coverage assessment for ECS, Lambda, RDS, Aurora, and SQS.

Project description

opsfabric-discovery

PyPI License: MIT Python Downloads

Open-source AWS reliability audit. Produces a 3-page executive PDF assessing CloudWatch alarm coverage across ECS, Lambda, RDS, Aurora, and SQS workloads. Runs locally — your data never leaves your laptop.

DiscoveryFabric is the audit layer of the OpsFabric reliability platform. The closed-source companion products — AlarmFabric (alarm remediation) and OpsFabric (incident response orchestration) — turn the audit findings into production-fixing automations. See docs/comparison.md for the full feature matrix.

See what your audit would look like — no AWS needed

Download a sample report (PDF, ~68 KB)

Or run it yourself in 30 seconds:

pip install opsfabric-discovery
opsfabric-discovery audit --demo
# → out/audit-demo.pdf

--demo runs against a baked-in synthetic dataset that exercises every feature of the audit (DEGRADED alarm detection, ALB→ECS bridge, critical-gap cards, coverage breakdown). No AWS calls, no credentials needed. Same matching engine, same PDF — only the input is fake.

What it does

  • Discovers AWS resources via Resource Explorer 2 across one or all enabled regions.
  • Maps CloudWatch alarms to those resources using a five-strategy matcher (exact dimensions, ALB target-group bridge for ECS, namespace + partial dimensions, log-group → metric-filter linkage, naming heuristic).
  • Detects alarms that exist but won't notify (actions disabled / no SNS target / INSUFFICIENT_DATA) and surfaces them as DEGRADED — they don't count toward coverage.
  • Scores required-check coverage against the OpsFabric reliability baseline.
  • Renders an executive PDF (3 pages) plus JSON appendices.

Trust statement

  • Read-only. Calls only AWS describe / list APIs. Never creates, modifies, or deletes any resource.
  • Runs on your laptop. No telemetry, no phone-home. Your data never leaves your machine.
  • Source is auditable. Open the installed Python files — every AWS call is visible in discovery_fabric/aws/.
  • Minimum IAM permissions (read-only across the board): sts:GetCallerIdentity, ec2:DescribeRegions, resource-explorer-2:ListViews / GetView / Search, tag:GetResources, cloudwatch:DescribeAlarms, logs:DescribeLogGroups / DescribeMetricFilters, ecs:ListClusters / ListServices / DescribeServices / DescribeTaskDefinition, lambda:ListFunctions / GetFunction, rds:DescribeDBInstances / DescribeDBClusters, sqs:ListQueues / GetQueueAttributes.

Install

pip install opsfabric-discovery
opsfabric-discovery --help

Quickstart

Once installed, from any directory:

# Audit a profile from ~/.aws/credentials
opsfabric-discovery audit --profile prod --regions all --account-alias acme-prod

# Or via STS assume-role (cross-account)
opsfabric-discovery audit \
  --assume-role-arn arn:aws:iam::CUSTOMER_ACCOUNT:role/OpsFabricAuditor \
  --external-id agreed-secret \
  --regions all \
  --account-alias acme-prod

# Show OpsFabric product context (DiscoveryFabric / AlarmFabric / OpsFabric)
opsfabric-discovery --about

# Outputs land in ./out/ by default; override with --output-dir
ls out/
# audit-<account-id>-<YYYYMMDD>.pdf
# alarm-coverage-score.json
# alarm-coverage-missing.json
# resource-mapping.json
# all-resources.json
# audit-meta.json

Closing the gaps

DiscoveryFabric tells you what's missing. To close the gaps, you have two paths:

  1. DIY — open the audit PDF, click through to the AWS Console, author each missing CloudWatch alarm by hand. Free, but tedious; a mid-market fleet usually has 30–200 missing alarms.
  2. AlarmFabric — the OpsFabric remediation product. Reads this audit's JSON output, generates the alarms in your account via the same read-only role used for the audit (plus cloudwatch:PutMetricAlarm), and tags each alarm with its provenance for easy rollback. Typical turnaround: under one engineering day for a fleet of any size. Closed-source SaaS — opsfabric.ai or email vaishal2611@gmail.com.

Once the alarms start firing, OpsFabric handles the incident lifecycle: triage from Slack/CloudWatch/Jira, automated RCA, remediation suggestions, Confluence post-mortems, ticket close-out. Also closed-source SaaS.

The OSS audit is genuinely useful on its own. The paid products are a different layer — not a crippled version of the audit, just a different part of the reliability loop.

Open source vs commercial — feature matrix

Capability DiscoveryFabric (OSS) AlarmFabric (paid) OpsFabric (paid)
Read-only audit
Resource discovery (ECS / Lambda / RDS / Aurora / SQS)
Five-strategy alarm matching
DEGRADED alarm detection
Executive PDF + JSON output
--demo synthetic walkthrough
Create missing alarms in your account
Tagged + reversible alarm provenance
SNS / PagerDuty / Opsgenie wiring
Scheduled / continuous audits
Slack-based incident triage
Automated RCA + remediation suggestions
Jira / Confluence incident lifecycle
Multi-tenant managed SaaS
Pricing Free (MIT) opsfabric.ai opsfabric.ai

See docs/comparison.md for the full version with one-paragraph explanations of each row.

About OpsFabric

We build reliability automation for AWS-heavy mid-market teams. DiscoveryFabric is open-source under the MIT license because the audit should be free — we make money on the remediation and incident-response automation. The OSS funnel and the SaaS funnel feed each other: a team runs the audit, sees their coverage is below baseline, and decides whether to fix the gaps themselves or have us do it.

Contributing

PRs welcome — see CONTRIBUTING.md for dev setup, test conventions, and what's in scope (resource types, matching strategies, output polish) vs out of scope (live alarm creation, runtime incident handling — those belong to the commercial products).

By participating, you agree to our Code of Conduct.

Support

Question type Where to go
Bug in the OSS audit tool GitHub issues — use the Bug Report template
Feature idea for the OSS audit GitHub issues — use the Feature Request template
Security issue Email vaishal2611@gmail.com (SECURITY.md)
Commercial AlarmFabric / OpsFabric questions Email vaishal2611@gmail.com

License

MIT — © 2026 Vaishal Shah / OpsFabric.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

opsfabric_discovery-0.3.1.tar.gz (193.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

opsfabric_discovery-0.3.1-py3-none-any.whl (55.1 kB view details)

Uploaded Python 3

File details

Details for the file opsfabric_discovery-0.3.1.tar.gz.

File metadata

  • Download URL: opsfabric_discovery-0.3.1.tar.gz
  • Upload date:
  • Size: 193.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.7

File hashes

Hashes for opsfabric_discovery-0.3.1.tar.gz
Algorithm Hash digest
SHA256 0399192e4877f8b745330a002955faf5cbdcd93d9223dc4f76656687253830b4
MD5 96b1837a4a516ebdb3b21886d12cac41
BLAKE2b-256 e6bd99f80b53dca976409190392b0bb84e82d51f42f0d818fd71f665effcb1bc

See more details on using hashes here.

File details

Details for the file opsfabric_discovery-0.3.1-py3-none-any.whl.

File metadata

File hashes

Hashes for opsfabric_discovery-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 90e82cdbeffb2e4f1d59d31ad75955f7eb32dcf27b3ceaec04cca3ff9e6b1354
MD5 ee9e3270e5069e519854773f86d28f89
BLAKE2b-256 50a2887c01d7fbcd903f3aa23d1693b52b0fd951b81605ee553494c208017bf9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page