NthLayer - The Missing Layer of Reliability
Project description
NthLayer
Reliability as code. Pure compiler.
Define reliability requirements in a manifest. Generate dashboards, alerts, SLOs, and documentation โ deterministically, every time.
TL;DR
pip install nthlayer
nthlayer init
nthlayer apply service.yaml
โ ๏ธ The Problem
Reliability decisions happen too late. Teams set SLOs in isolation, deploy without checking error budgets, and discover missing metrics during incidents. Dashboards are inconsistent. Alerts are copy-pasted. Nobody validates whether a 99.99% target is even achievable given dependencies.
๐ก The Solution
NthLayer is a pure compiler for reliability infrastructure. Write a manifest, get artifacts:
service.yaml โ validate โ apply
โ โ
โ โโโ Grafana dashboards, Prometheus alerts,
โ recording rules, SLOs, PagerDuty config,
โ Backstage entities, service docs
โ
โโโ SLO feasible? Dependencies support it? Metrics exist?
Policies pass? Ceiling valid?
NthLayer generates. nthlayer-observe enforces at runtime.
โก Core Features
Artifact Generation
Generate dashboards, alerts, and SLOs from a single spec.
$ nthlayer apply service.yaml
Generated:
โ dashboard.json (Grafana)
โ alerts.yaml (Prometheus)
โ recording-rules.yaml (Prometheus)
โ slos.yaml (OpenSLO)
โ backstage.json (Backstage entity)
Dependency-Aware SLO Validation
Your SLO ceiling is your weakest dependency chain. NthLayer calculates it.
$ nthlayer validate-slo payment-api
Target: 99.99% availability
Dependencies:
โ postgresql (99.95%)
โ redis (99.99%)
โ user-service (99.9%)
Serial availability: 99.84%
โ INFEASIBLE: Target exceeds dependency ceiling by 0.15%
Recommendation: Reduce target to 99.8% or improve user-service SLO
Metric Recommendations
Enforce OpenTelemetry conventions. Know what's missing before production.
$ nthlayer recommend-metrics payment-api
Required (SLO-critical):
โ http.server.request.duration FOUND
โ http.server.active_requests MISSING
Run with --show-code for instrumentation examples.
Monte Carlo SLO Simulation
Model failure scenarios before they happen.
$ nthlayer simulate service.yaml --scenarios 10000
Monte Carlo Simulation (10,000 runs)
SLO: availability โฅ 99.9%
Result: 94.2% of scenarios meet target
P50 availability: 99.95%
P99 availability: 99.82%
Risk: 5.8% chance of SLO breach in 30d window
Topology Export
Export dependency graphs for correlation engines.
$ nthlayer topology export service.yaml --format json
$ nthlayer topology export service.yaml --format mermaid
$ nthlayer topology export service.yaml --format dot
Policy Validation
Enforce organizational standards at build time.
$ nthlayer validate service.yaml --policies policies.yaml
โ required_fields: ownership.runbook present
โ tier_constraint: critical services require deployment gates
โ dependency_rule: all critical deps have SLOs
๐ Quick Start
# Install
pip install nthlayer
# Create a service spec
nthlayer init
# Validate and generate
nthlayer apply service.yaml
Minimal service.yaml
name: payment-api
tier: critical
type: api
team: payments
dependencies:
- postgresql
- redis
NthLayer also supports the OpenSRM format (apiVersion: opensrm/v1) for contracts, deployment gates, and more. See full spec reference for all options.
๐ CI/CD Integration
# GitHub Actions
- name: Validate reliability
run: |
nthlayer validate service.yaml
nthlayer validate-slo service.yaml
nthlayer apply service.yaml --output-dir generated/
For runtime enforcement (deployment gates, drift detection, error budget checks), use nthlayer-observe:
- name: Gate deployment
run: |
nthlayer-observe check-deploy payment-api
Works with: GitHub Actions, GitLab CI, ArgoCD, Tekton, Jenkins
๐ฏ How It's Different
| Traditional Approach | NthLayer |
|---|---|
| Set SLOs in isolation | Validate against dependency chains |
| Manual dashboard creation | Generate from spec |
| Copy-paste alerts | 593+ alert templates, auto-selected |
| Discover missing metrics in incidents | Enforce before deployment |
| "Is this ready?" = opinion | "Is this ready?" = deterministic check |
๐ Documentation
Full Documentation - Comprehensive guides and reference.
| Guide | Description |
|---|---|
| Quick Start | Get running in 5 minutes |
| Dependency Discovery | Automatic dependency mapping |
| CI/CD Integration | Pipeline setup |
| CLI Reference | All commands |
๐บ๏ธ Roadmap
Generate (this repo)
- Artifact generation (dashboards, alerts, SLOs, recording rules, Loki alerts)
- Dependency-aware SLO validation
- Metric recommendations (OpenTelemetry conventions)
- Monte Carlo SLO simulation
- Policy validation (build-time)
- Topology export (JSON, Mermaid, DOT)
- OpenSRM manifest format (
opensrm/v1) - Identity resolution & ownership
- Backstage entity generation
- Service documentation generation
- CI/CD GitHub Action
- Agentic inference (
nthlayer infer) - MCP server integration
- Backstage plugin
Observe (nthlayer-observe)
- Deployment gates (
check-deploy) - Drift detection (
drift) - Error budget collection (
collect) - Portfolio view (
portfolio) - Reliability scorecard (
scorecard) - Blast radius analysis (
blast-radius) - Dependency discovery (
discover,dependencies) - Runtime verification (
verify)
Agentic Inference (Planned)
nthlayer infer will use a model to analyse a codebase and propose an OpenSRM manifest for it. The model examines the code, identifies services, infers appropriate SLO targets, and generates a draft service.reliability.yaml that NthLayer then validates and generates artifacts from.
This follows Zero Framework Cognition: the model provides judgment (what SLOs does this service need?), and NthLayer provides transport (validate the manifest, generate the monitoring artifacts). Clean boundary between reasoning and deterministic transformation.
OpenSRM Ecosystem
NthLayer is one component in the OpenSRM ecosystem. Each component solves a complete problem independently, and they compose when used together through shared OpenSRM manifests and OTel telemetry conventions.
โโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ OpenSRM Manifest โ
โ (the shared contract) โ
โโโโโโโโโโโโโโฌโโโโโโโโโโโโโ
โ
reads โ reads
โโโโโโโโโโโโโโโฌโโโโโโโดโโโโโโโฌโโโโโโโโโโโโโโ
โผ โผ โผ โผ
โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ
โ MEASURE โ โ>NTHLAYER<โ โCORRELATE โ โ RESPOND โ
โ โ โ โ โ โ โ โ
โ quality โ โ generate โ โcorrelate โ โ incident โ
โ+govern โ โ monitoringโ โ signals โ โ response โ
โ+cost โ โ โ โ โ โ โ
โโโโโโฌโโโโโโ โโโโโโฌโโโโโโ โโโโโโฌโโโโโโ โโโโโโฌโโโโโโ
โ โ โ โ
โโโโโโโโโโโโโโโดโโโโโโโฌโโโโโโโดโโโโโโโโโโโโโโ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Verdict Store โ
โ (shared data substrate) โ
โ create ยท resolve ยท link โ
โ accuracy ยท gaming-check โ
โโโโโโโโโโโโโโฌโโโโโโโโโโโโโโ
โ OTel side-effects
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ OTel Collector / โ
โ Prometheus / Grafana โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Learning loop (post-incident):
nthlayer-respond findings โ manifest updates
โ NthLayer regenerates โ nthlayer-measure
refines โ nthlayer-correlate improves โ OpenSRM
How NthLayer fits in:
- NthLayer reads OpenSRM manifests and generates the monitoring infrastructure (Prometheus rules, Grafana dashboards, PagerDuty config) that the rest of the ecosystem relies on
- Verdict operations emit OTel side-effects (
gen_ai.decision.*,gen_ai.override.*) that flow into Prometheus. NthLayer generates dashboards for these metrics alongside service dashboards โ NthLayer reads from Prometheus, not the Verdict Store directly. - NthLayer exports service topology that nthlayer-correlate uses for topology-aware signal correlation
- nthlayer-respond's post-incident findings feed back into NthLayer as rule refinements (alerts that should have fired earlier or didn't fire at all)
Each component works alone. Someone who just needs reliability-as-code adopts NthLayer without needing the rest of the ecosystem.
| Component | What it does | Link |
|---|---|---|
| OpenSRM | Specification for declaring service reliability requirements | OpenSRM |
| NthLayer | Generate monitoring infrastructure from manifests (this repo) | nthlayer |
| nthlayer-observe | Runtime enforcement: deployment gates, drift detection, error budgets | nthlayer-observe |
| nthlayer-learn | Data primitive for recording AI judgments and measuring correctness | nthlayer-learn |
| nthlayer-measure | Quality measurement and governance for AI agents | nthlayer-measure |
| nthlayer-correlate | Situational awareness through signal correlation | nthlayer-correlate |
| nthlayer-respond | Multi-agent incident response | nthlayer-respond |
๐ค Contributing
# Install uv (https://docs.astral.sh/uv/)
curl -LsSf https://astral.sh/uv/install.sh | sh
git clone https://github.com/rsionnach/nthlayer.git
cd nthlayer
make setup # Install deps, start services
make test # Run tests
See CONTRIBUTING.md for details.
๐ License
MIT - See LICENSE.txt
๐ Acknowledgments
Built on grafana-foundation-sdk, awesome-prometheus-alerts, pint, and OpenSLO. Inspired by Sloth and autograf.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nthlayer-0.1.0a20.tar.gz.
File metadata
- Download URL: nthlayer-0.1.0a20.tar.gz
- Upload date:
- Size: 539.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
04bdb131973490db62f0a1c6b5fde826185fa90be597b1d329a4faeceaf91b39
|
|
| MD5 |
38b7dbd38ce13f2d0ca25258a63e126a
|
|
| BLAKE2b-256 |
f111a8120cef900cfa1385a847a0cf55d40a9c03e46e2b975c7dfec01c90a12f
|
Provenance
The following attestation bundles were made for nthlayer-0.1.0a20.tar.gz:
Publisher:
release.yml on rsionnach/nthlayer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
nthlayer-0.1.0a20.tar.gz -
Subject digest:
04bdb131973490db62f0a1c6b5fde826185fa90be597b1d329a4faeceaf91b39 - Sigstore transparency entry: 1281316477
- Sigstore integration time:
-
Permalink:
rsionnach/nthlayer@d4a2fdaf24c81465b5e282649413bafa508d62d5 -
Branch / Tag:
refs/tags/v0.1.0a20 - Owner: https://github.com/rsionnach
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@d4a2fdaf24c81465b5e282649413bafa508d62d5 -
Trigger Event:
release
-
Statement type:
File details
Details for the file nthlayer-0.1.0a20-py3-none-any.whl.
File metadata
- Download URL: nthlayer-0.1.0a20-py3-none-any.whl
- Upload date:
- Size: 413.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2dd56e84b2010e1778ef1e84407025b219e1f43ffce918822d72d15b31fba30b
|
|
| MD5 |
03306081cb2ed4f4d243af62ebff549d
|
|
| BLAKE2b-256 |
dd7c32d21c3a4d297845175f39473e0ee4eaeccb3e25d86d13ac6bc61e9b605e
|
Provenance
The following attestation bundles were made for nthlayer-0.1.0a20-py3-none-any.whl:
Publisher:
release.yml on rsionnach/nthlayer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
nthlayer-0.1.0a20-py3-none-any.whl -
Subject digest:
2dd56e84b2010e1778ef1e84407025b219e1f43ffce918822d72d15b31fba30b - Sigstore transparency entry: 1281316590
- Sigstore integration time:
-
Permalink:
rsionnach/nthlayer@d4a2fdaf24c81465b5e282649413bafa508d62d5 -
Branch / Tag:
refs/tags/v0.1.0a20 - Owner: https://github.com/rsionnach
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@d4a2fdaf24c81465b5e282649413bafa508d62d5 -
Trigger Event:
release
-
Statement type: