Read-only AWS reliability audit. Alarm coverage assessment for ECS, Lambda, RDS, Aurora, and SQS.
Project description
opsfabric-discovery
Open-source AWS reliability audit. Produces a 3-page executive PDF assessing CloudWatch alarm coverage across ECS, Lambda, RDS, Aurora, and SQS workloads. Runs locally — your data never leaves your laptop.
DiscoveryFabric is the audit layer of the OpsFabric reliability platform. The closed-source companion products — AlarmFabric (alarm remediation) and OpsFabric (incident response orchestration) — turn the audit findings into production-fixing automations. See
docs/comparison.mdfor the full feature matrix.
See what your audit would look like — no AWS needed
Download a sample report (PDF, ~68 KB)
Or run it yourself in 30 seconds:
pip install opsfabric-discovery
opsfabric-discovery audit --demo
# → out/audit-demo.pdf
--demo runs against a baked-in synthetic dataset that exercises every feature of the audit (DEGRADED alarm detection, ALB→ECS bridge, critical-gap cards, coverage breakdown). No AWS calls, no credentials needed. Same matching engine, same PDF — only the input is fake.
What it does
- Discovers AWS resources via Resource Explorer 2 across one or all enabled regions.
- Maps CloudWatch alarms to those resources using a five-strategy matcher (exact dimensions, ALB target-group bridge for ECS, namespace + partial dimensions, log-group → metric-filter linkage, naming heuristic).
- Detects alarms that exist but won't notify (actions disabled / no SNS target /
INSUFFICIENT_DATA) and surfaces them as DEGRADED — they don't count toward coverage. - Scores required-check coverage against the OpsFabric reliability baseline.
- Renders an executive PDF (3 pages) plus JSON appendices.
Trust statement
- Read-only. Calls only AWS
describe/listAPIs. Never creates, modifies, or deletes any resource. - Runs on your laptop. No telemetry, no phone-home. Your data never leaves your machine.
- Source is auditable. Open the installed Python files — every AWS call is visible in
discovery_fabric/aws/. - Minimum IAM permissions (read-only across the board):
sts:GetCallerIdentity,ec2:DescribeRegions,resource-explorer-2:ListViews/GetView/Search,tag:GetResources,cloudwatch:DescribeAlarms,logs:DescribeLogGroups/DescribeMetricFilters,ecs:ListClusters/ListServices/DescribeServices/DescribeTaskDefinition,lambda:ListFunctions/GetFunction,rds:DescribeDBInstances/DescribeDBClusters,sqs:ListQueues/GetQueueAttributes.
Install
pip install opsfabric-discovery
opsfabric-discovery --help
Quickstart
Once installed, from any directory:
# Audit a profile from ~/.aws/credentials
opsfabric-discovery audit --profile prod --regions all --account-alias acme-prod
# Or via STS assume-role (cross-account)
opsfabric-discovery audit \
--assume-role-arn arn:aws:iam::CUSTOMER_ACCOUNT:role/OpsFabricAuditor \
--external-id agreed-secret \
--regions all \
--account-alias acme-prod
# Show OpsFabric product context (DiscoveryFabric / AlarmFabric / OpsFabric)
opsfabric-discovery --about
# Outputs land in ./out/ by default; override with --output-dir
ls out/
# audit-<account-id>-<YYYYMMDD>.pdf
# alarm-coverage-score.json
# alarm-coverage-missing.json
# resource-mapping.json
# all-resources.json
# audit-meta.json
Closing the gaps
DiscoveryFabric tells you what's missing. To close the gaps, you have two paths:
- DIY — open the audit PDF, click through to the AWS Console, author each missing CloudWatch alarm by hand. Free, but tedious; a mid-market fleet usually has 30–200 missing alarms.
- AlarmFabric — the OpsFabric remediation product. Reads this audit's JSON output, generates the alarms in your account via the same read-only role used for the audit (plus
cloudwatch:PutMetricAlarm), and tags each alarm with its provenance for easy rollback. Typical turnaround: under one engineering day for a fleet of any size. Closed-source SaaS — opsfabric.ai or email vaishal2611@gmail.com.
Once the alarms start firing, OpsFabric handles the incident lifecycle: triage from Slack/CloudWatch/Jira, automated RCA, remediation suggestions, Confluence post-mortems, ticket close-out. Also closed-source SaaS.
The OSS audit is genuinely useful on its own. The paid products are a different layer — not a crippled version of the audit, just a different part of the reliability loop.
Open source vs commercial — feature matrix
| Capability | DiscoveryFabric (OSS) | AlarmFabric (paid) | OpsFabric (paid) |
|---|---|---|---|
| Read-only audit | ✅ | ✅ | ✅ |
| Resource discovery (ECS / Lambda / RDS / Aurora / SQS) | ✅ | ✅ | ✅ |
| Five-strategy alarm matching | ✅ | ✅ | ✅ |
| DEGRADED alarm detection | ✅ | ✅ | ✅ |
| Executive PDF + JSON output | ✅ | ✅ | ✅ |
--demo synthetic walkthrough |
✅ | ✅ | ✅ |
| Create missing alarms in your account | ❌ | ✅ | ✅ |
| Tagged + reversible alarm provenance | ❌ | ✅ | ✅ |
| SNS / PagerDuty / Opsgenie wiring | ❌ | ✅ | ✅ |
| Scheduled / continuous audits | ❌ | ✅ | ✅ |
| Slack-based incident triage | ❌ | ❌ | ✅ |
| Automated RCA + remediation suggestions | ❌ | ❌ | ✅ |
| Jira / Confluence incident lifecycle | ❌ | ❌ | ✅ |
| Multi-tenant managed SaaS | ❌ | ✅ | ✅ |
| Pricing | Free (MIT) | opsfabric.ai | opsfabric.ai |
See docs/comparison.md for the full version with one-paragraph explanations of each row.
About OpsFabric
We build reliability automation for AWS-heavy mid-market teams. DiscoveryFabric is open-source under the MIT license because the audit should be free — we make money on the remediation and incident-response automation. The OSS funnel and the SaaS funnel feed each other: a team runs the audit, sees their coverage is below baseline, and decides whether to fix the gaps themselves or have us do it.
- Website: opsfabric.ai
- Commercial questions: vaishal2611@gmail.com
- AlarmFabric product: opsfabric.ai
- OpsFabric incident response: opsfabric.ai
Contributing
PRs welcome — see CONTRIBUTING.md for dev setup, test conventions, and what's in scope (resource types, matching strategies, output polish) vs out of scope (live alarm creation, runtime incident handling — those belong to the commercial products).
By participating, you agree to our Code of Conduct.
Support
| Question type | Where to go |
|---|---|
| Bug in the OSS audit tool | GitHub issues — use the Bug Report template |
| Feature idea for the OSS audit | GitHub issues — use the Feature Request template |
| Security issue | Email vaishal2611@gmail.com (SECURITY.md) |
| Commercial AlarmFabric / OpsFabric questions | Email vaishal2611@gmail.com |
License
MIT — © 2026 Vaishal Shah / OpsFabric.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file opsfabric_discovery-0.3.1.tar.gz.
File metadata
- Download URL: opsfabric_discovery-0.3.1.tar.gz
- Upload date:
- Size: 193.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0399192e4877f8b745330a002955faf5cbdcd93d9223dc4f76656687253830b4
|
|
| MD5 |
96b1837a4a516ebdb3b21886d12cac41
|
|
| BLAKE2b-256 |
e6bd99f80b53dca976409190392b0bb84e82d51f42f0d818fd71f665effcb1bc
|
File details
Details for the file opsfabric_discovery-0.3.1-py3-none-any.whl.
File metadata
- Download URL: opsfabric_discovery-0.3.1-py3-none-any.whl
- Upload date:
- Size: 55.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
90e82cdbeffb2e4f1d59d31ad75955f7eb32dcf27b3ceaec04cca3ff9e6b1354
|
|
| MD5 |
ee9e3270e5069e519854773f86d28f89
|
|
| BLAKE2b-256 |
50a2887c01d7fbcd903f3aa23d1693b52b0fd951b81605ee553494c208017bf9
|