Skip to main content

Read-only AWS reliability audit. Alarm-coverage assessment + real-usage urgency ranking for ECS, Lambda, RDS, Aurora, and SQS. Produces a self-contained HTML report.

Project description

opsfabric-discovery

PyPI License: MIT Python Downloads

Open-source AWS reliability audit. Produces a self-contained HTML report assessing CloudWatch alarm coverage across ECS, Lambda, RDS, Aurora, and SQS workloads — and ranks the gaps by real 30-day usage so you see which fixes matter this week. Runs locally; your data never leaves your laptop.

DiscoveryFabric is the audit layer of the OpsFabric reliability platform. The closed-source companion products — AlarmFabric (alarm remediation) and OpsFabric (incident response orchestration) — turn the audit findings into production-fixing automations. See docs/comparison.md for the full feature matrix.

See what your audit would look like — no AWS needed

Run it yourself in 30 seconds:

pip install opsfabric-discovery
opsfabric-discovery audit --demo
# → out/audit-demo.html  (open in any browser)

--demo runs against a baked-in synthetic dataset that exercises every feature of the audit (DEGRADED alarm detection, ALB→ECS bridge, urgency ranking from synthetic CloudWatch usage, business-vs-technical tabs). No AWS calls, no credentials needed. Same matching engine, same report — only the input is fake.

What it does

  • Discovers AWS resources via Resource Explorer 2 across one or all enabled regions.
  • Maps CloudWatch alarms to those resources using a five-strategy matcher (exact dimensions, ALB target-group bridge for ECS, namespace + partial dimensions, log-group → metric-filter linkage, naming heuristic).
  • Detects alarms that exist but won't notify (actions disabled / no SNS target / INSUFFICIENT_DATA) and surfaces them as DEGRADED — they don't count toward coverage.
  • Scores required-check coverage against the OpsFabric reliability baseline.
  • Fetches 30 days of CloudWatch usage per resource (read-only GetMetricData) and ranks the gaps by urgency — so a Lambda with a 55% error rate and no alarm jumps to the top, not a quiet function with a missing alarm.
  • Renders a self-contained HTML report (opens in any browser, prints reasonably, emailable as one attachment) plus JSON appendices.

Trust statement

  • Read-only. Calls only AWS describe / list APIs. Never creates, modifies, or deletes any resource.
  • Runs on your laptop. No telemetry, no phone-home. Your data never leaves your machine.
  • Source is auditable. Open the installed Python files — every AWS call is visible in discovery_fabric/aws/.
  • Minimum IAM permissions (read-only across the board): sts:GetCallerIdentity, ec2:DescribeRegions, resource-explorer-2:ListViews / GetView / Search, tag:GetResources, cloudwatch:DescribeAlarms, cloudwatch:GetMetricData, logs:DescribeLogGroups / DescribeMetricFilters, ecs:ListClusters / ListServices / DescribeServices / DescribeTaskDefinition, lambda:ListFunctions / GetFunction, rds:DescribeDBInstances / DescribeDBClusters, sqs:ListQueues / GetQueueAttributes. (GetMetricData is the only addition vs 0.3.x — used to fetch 30-day usage signals for urgency ranking.)

Install

pip install opsfabric-discovery
opsfabric-discovery --help

Quickstart

Once installed, from any directory:

# Audit a profile from ~/.aws/credentials
opsfabric-discovery audit --profile prod --regions all --account-alias acme-prod

# Or via STS assume-role (cross-account)
opsfabric-discovery audit \
  --assume-role-arn arn:aws:iam::CUSTOMER_ACCOUNT:role/OpsFabricAuditor \
  --external-id agreed-secret \
  --regions all \
  --account-alias acme-prod

# Show OpsFabric product context (DiscoveryFabric / AlarmFabric / OpsFabric)
opsfabric-discovery --about

# Outputs land in ./out/ by default; override with --output-dir
ls out/
# audit-<account-id>-<YYYYMMDD>.html    ← open in any browser
# alarm-coverage-score.json
# alarm-coverage-missing.json
# resource-mapping.json
# all-resources.json
# audit-meta.json

Closing the gaps

DiscoveryFabric tells you what's missing. To close the gaps, you have two paths:

  1. DIY — open the audit report, click through to the AWS Console, author each missing CloudWatch alarm by hand. Free, but tedious; a mid-market fleet usually has 30–200 missing alarms.
  2. AlarmFabric — the OpsFabric remediation product. Reads this audit's JSON output, generates the alarms in your account via the same read-only role used for the audit (plus cloudwatch:PutMetricAlarm), and tags each alarm with its provenance for easy rollback. Typical turnaround: under one engineering day for a fleet of any size. Closed-source SaaS — opsfabric.ai or email vaishal2611@gmail.com.

Once the alarms start firing, OpsFabric handles the incident lifecycle: triage from Slack/CloudWatch/Jira, automated RCA, remediation suggestions, Confluence post-mortems, ticket close-out. Also closed-source SaaS.

The OSS audit is genuinely useful on its own. The paid products are a different layer — not a crippled version of the audit, just a different part of the reliability loop.

Open source vs commercial — feature matrix

Capability DiscoveryFabric (OSS) AlarmFabric (paid) OpsFabric (paid)
Read-only audit
Resource discovery (ECS / Lambda / RDS / Aurora / SQS)
Five-strategy alarm matching
DEGRADED alarm detection
Executive HTML report + JSON output
--demo synthetic walkthrough
Create missing alarms in your account
Tagged + reversible alarm provenance
SNS / PagerDuty / Opsgenie wiring
Scheduled / continuous audits
Slack-based incident triage
Automated RCA + remediation suggestions
Jira / Confluence incident lifecycle
Multi-tenant managed SaaS
Pricing Free (MIT) opsfabric.ai opsfabric.ai

See docs/comparison.md for the full version with one-paragraph explanations of each row.

About OpsFabric

We build reliability automation for AWS-heavy mid-market teams. DiscoveryFabric is open-source under the MIT license because the audit should be free — we make money on the remediation and incident-response automation. The OSS funnel and the SaaS funnel feed each other: a team runs the audit, sees their coverage is below baseline, and decides whether to fix the gaps themselves or have us do it.

Contributing

PRs welcome — see CONTRIBUTING.md for dev setup, test conventions, and what's in scope (resource types, matching strategies, output polish) vs out of scope (live alarm creation, runtime incident handling — those belong to the commercial products).

By participating, you agree to our Code of Conduct.

Support

Question type Where to go
Bug in the OSS audit tool GitHub issues — use the Bug Report template
Feature idea for the OSS audit GitHub issues — use the Feature Request template
Security issue Email vaishal2611@gmail.com (SECURITY.md)
Commercial AlarmFabric / OpsFabric questions Email vaishal2611@gmail.com

License

MIT — © 2026 Vaishal Shah / OpsFabric.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

opsfabric_discovery-0.4.0.tar.gz (196.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

opsfabric_discovery-0.4.0-py3-none-any.whl (86.2 kB view details)

Uploaded Python 3

File details

Details for the file opsfabric_discovery-0.4.0.tar.gz.

File metadata

  • Download URL: opsfabric_discovery-0.4.0.tar.gz
  • Upload date:
  • Size: 196.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.7

File hashes

Hashes for opsfabric_discovery-0.4.0.tar.gz
Algorithm Hash digest
SHA256 6074e543b2fe13dee526e4858494b8daf1293a42d28de705b6b290afe62389b2
MD5 9bf58db30b8a1c5a3180ca72b73d5497
BLAKE2b-256 eff63ccb6b1ba7bc7eab20ece5c50ad1c704d1d9a12db9be302aa0157cd41d74

See more details on using hashes here.

File details

Details for the file opsfabric_discovery-0.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for opsfabric_discovery-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 60dc07e0baa1375d16e9b2322cbbe35502559d1a81124fcfb3a6a7d90ddfcc04
MD5 2c7f8ea665cde1ea970950c4c6e0231b
BLAKE2b-256 00e17f4aaff49529bf53203f65e70c0108dc0bd297dff86d82120addf4a903ac

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page