Skip to main content

NthLayer - The Missing Layer of Reliability

Project description

NthLayer

Reliability as code. Pure compiler.

Define reliability requirements in a manifest. Generate dashboards, alerts, SLOs, and documentation โ€” deterministically, every time.

Status: Alpha PyPI License: MIT Alert Rules

TL;DR

pip install nthlayer
nthlayer init
nthlayer apply service.yaml

โš ๏ธ The Problem

Reliability decisions happen too late. Teams set SLOs in isolation, deploy without checking error budgets, and discover missing metrics during incidents. Dashboards are inconsistent. Alerts are copy-pasted. Nobody validates whether a 99.99% target is even achievable given dependencies.

๐Ÿ’ก The Solution

NthLayer is a pure compiler for reliability infrastructure. Write a manifest, get artifacts:

service.yaml โ†’ validate โ†’ apply
                  โ”‚          โ”‚
                  โ”‚          โ””โ”€โ”€ Grafana dashboards, Prometheus alerts,
                  โ”‚              recording rules, SLOs, PagerDuty config,
                  โ”‚              Backstage entities, service docs
                  โ”‚
                  โ””โ”€โ”€ SLO feasible? Dependencies support it? Metrics exist?
                      Policies pass? Ceiling valid?

NthLayer generates. nthlayer-observe enforces at runtime.


โšก Core Features

Artifact Generation

Generate dashboards, alerts, and SLOs from a single spec.

$ nthlayer apply service.yaml

Generated:
  โ†’ dashboard.json (Grafana)
  โ†’ alerts.yaml (Prometheus)
  โ†’ recording-rules.yaml (Prometheus)
  โ†’ slos.yaml (OpenSLO)
  โ†’ backstage.json (Backstage entity)

Dependency-Aware SLO Validation

Your SLO ceiling is your weakest dependency chain. NthLayer calculates it.

$ nthlayer validate-slo payment-api

Target: 99.99% availability
Dependencies:
  โ†’ postgresql (99.95%)
  โ†’ redis (99.99%)
  โ†’ user-service (99.9%)

Serial availability: 99.84%
โœ— INFEASIBLE: Target exceeds dependency ceiling by 0.15%

Recommendation: Reduce target to 99.8% or improve user-service SLO

Metric Recommendations

Enforce OpenTelemetry conventions. Know what's missing before production.

$ nthlayer recommend-metrics payment-api

Required (SLO-critical):
  โœ“ http.server.request.duration    FOUND
  โœ— http.server.active_requests     MISSING

Run with --show-code for instrumentation examples.

Monte Carlo SLO Simulation

Model failure scenarios before they happen.

$ nthlayer simulate service.yaml --scenarios 10000

Monte Carlo Simulation (10,000 runs)
  SLO: availability โ‰ฅ 99.9%
  Result: 94.2% of scenarios meet target
  P50 availability: 99.95%
  P99 availability: 99.82%
  Risk: 5.8% chance of SLO breach in 30d window

Topology Export

Export dependency graphs for correlation engines.

$ nthlayer topology export service.yaml --format json
$ nthlayer topology export service.yaml --format mermaid
$ nthlayer topology export service.yaml --format dot

Policy Validation

Enforce organizational standards at build time.

$ nthlayer validate service.yaml --policies policies.yaml

โœ“ required_fields: ownership.runbook present
โœ— tier_constraint: critical services require deployment gates
โœ“ dependency_rule: all critical deps have SLOs

๐Ÿš€ Quick Start

# Install
pip install nthlayer

# Create a service spec
nthlayer init

# Validate and generate
nthlayer apply service.yaml

Minimal service.yaml

name: payment-api
tier: critical
type: api
team: payments

dependencies:
  - postgresql
  - redis

NthLayer also supports the OpenSRM format (apiVersion: opensrm/v1) for contracts, deployment gates, and more. See full spec reference for all options.


๐Ÿ”„ CI/CD Integration

# GitHub Actions
- name: Validate reliability
  run: |
    nthlayer validate service.yaml
    nthlayer validate-slo service.yaml
    nthlayer apply service.yaml --output-dir generated/

For runtime enforcement (deployment gates, drift detection, error budget checks), use nthlayer-observe:

- name: Gate deployment
  run: |
    nthlayer-observe check-deploy payment-api

Works with: GitHub Actions, GitLab CI, ArgoCD, Tekton, Jenkins


๐ŸŽฏ How It's Different

Traditional Approach NthLayer
Set SLOs in isolation Validate against dependency chains
Manual dashboard creation Generate from spec
Copy-paste alerts 593+ alert templates, auto-selected
Discover missing metrics in incidents Enforce before deployment
"Is this ready?" = opinion "Is this ready?" = deterministic check

๐Ÿ“š Documentation

Full Documentation - Comprehensive guides and reference.

Ask DeepWiki

Guide Description
Quick Start Get running in 5 minutes
Dependency Discovery Automatic dependency mapping
CI/CD Integration Pipeline setup
CLI Reference All commands

๐Ÿ—บ๏ธ Roadmap

Generate (this repo)

  • Artifact generation (dashboards, alerts, SLOs, recording rules, Loki alerts)
  • Dependency-aware SLO validation
  • Metric recommendations (OpenTelemetry conventions)
  • Monte Carlo SLO simulation
  • Policy validation (build-time)
  • Topology export (JSON, Mermaid, DOT)
  • OpenSRM manifest format (opensrm/v1)
  • Identity resolution & ownership
  • Backstage entity generation
  • Service documentation generation
  • CI/CD GitHub Action
  • Agentic inference (nthlayer infer)
  • MCP server integration
  • Backstage plugin

Observe (nthlayer-observe)

  • Deployment gates (check-deploy)
  • Drift detection (drift)
  • Error budget collection (collect)
  • Portfolio view (portfolio)
  • Reliability scorecard (scorecard)
  • Blast radius analysis (blast-radius)
  • Dependency discovery (discover, dependencies)
  • Runtime verification (verify)

Agentic Inference (Planned)

nthlayer infer will use a model to analyse a codebase and propose an OpenSRM manifest for it. The model examines the code, identifies services, infers appropriate SLO targets, and generates a draft service.reliability.yaml that NthLayer then validates and generates artifacts from.

This follows Zero Framework Cognition: the model provides judgment (what SLOs does this service need?), and NthLayer provides transport (validate the manifest, generate the monitoring artifacts). Clean boundary between reasoning and deterministic transformation.


OpenSRM Ecosystem

NthLayer is one component in the OpenSRM ecosystem. Each component solves a complete problem independently, and they compose when used together through shared OpenSRM manifests and OTel telemetry conventions.

                        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                        โ”‚     OpenSRM Manifest     โ”‚
                        โ”‚  (the shared contract)   โ”‚
                        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                     โ”‚
                    reads            โ”‚           reads
               โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
               โ–ผ             โ–ผ             โ–ผ             โ–ผ
         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
         โ”‚ MEASURE  โ”‚ โ”‚>NTHLAYER<โ”‚ โ”‚CORRELATE โ”‚ โ”‚ RESPOND  โ”‚
         โ”‚          โ”‚ โ”‚          โ”‚ โ”‚          โ”‚ โ”‚          โ”‚
         โ”‚ quality  โ”‚ โ”‚ generate โ”‚ โ”‚correlate โ”‚ โ”‚ incident โ”‚
         โ”‚+govern   โ”‚ โ”‚ monitoringโ”‚ โ”‚ signals  โ”‚ โ”‚ response โ”‚
         โ”‚+cost     โ”‚ โ”‚          โ”‚ โ”‚          โ”‚ โ”‚          โ”‚
         โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜
              โ”‚             โ”‚             โ”‚             โ”‚
              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                   โ–ผ
                     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                     โ”‚      Verdict Store       โ”‚
                     โ”‚  (shared data substrate) โ”‚
                     โ”‚ create ยท resolve ยท link  โ”‚
                     โ”‚ accuracy ยท gaming-check  โ”‚
                     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                  โ”‚ OTel side-effects
                                  โ–ผ
                     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                     โ”‚    OTel Collector /      โ”‚
                     โ”‚   Prometheus / Grafana   โ”‚
                     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

              Learning loop (post-incident):
              nthlayer-respond findings โ†’ manifest updates
              โ†’ NthLayer regenerates โ†’ nthlayer-measure
              refines โ†’ nthlayer-correlate improves โ†’ OpenSRM

How NthLayer fits in:

  • NthLayer reads OpenSRM manifests and generates the monitoring infrastructure (Prometheus rules, Grafana dashboards, PagerDuty config) that the rest of the ecosystem relies on
  • Verdict operations emit OTel side-effects (gen_ai.decision.*, gen_ai.override.*) that flow into Prometheus. NthLayer generates dashboards for these metrics alongside service dashboards โ€” NthLayer reads from Prometheus, not the Verdict Store directly.
  • NthLayer exports service topology that nthlayer-correlate uses for topology-aware signal correlation
  • nthlayer-respond's post-incident findings feed back into NthLayer as rule refinements (alerts that should have fired earlier or didn't fire at all)

Each component works alone. Someone who just needs reliability-as-code adopts NthLayer without needing the rest of the ecosystem.

Component What it does Link
OpenSRM Specification for declaring service reliability requirements OpenSRM
NthLayer Generate monitoring infrastructure from manifests (this repo) nthlayer
nthlayer-observe Runtime enforcement: deployment gates, drift detection, error budgets nthlayer-observe
nthlayer-learn Data primitive for recording AI judgments and measuring correctness nthlayer-learn
nthlayer-measure Quality measurement and governance for AI agents nthlayer-measure
nthlayer-correlate Situational awareness through signal correlation nthlayer-correlate
nthlayer-respond Multi-agent incident response nthlayer-respond

๐Ÿค Contributing

# Install uv (https://docs.astral.sh/uv/)
curl -LsSf https://astral.sh/uv/install.sh | sh

git clone https://github.com/rsionnach/nthlayer.git
cd nthlayer
make setup    # Install deps, start services
make test     # Run tests

See CONTRIBUTING.md for details.


๐Ÿ“„ License

MIT - See LICENSE.txt


๐Ÿ™ Acknowledgments

Built on grafana-foundation-sdk, awesome-prometheus-alerts, pint, and OpenSLO. Inspired by Sloth and autograf.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nthlayer-0.1.0a20.tar.gz (539.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nthlayer-0.1.0a20-py3-none-any.whl (413.2 kB view details)

Uploaded Python 3

File details

Details for the file nthlayer-0.1.0a20.tar.gz.

File metadata

  • Download URL: nthlayer-0.1.0a20.tar.gz
  • Upload date:
  • Size: 539.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for nthlayer-0.1.0a20.tar.gz
Algorithm Hash digest
SHA256 04bdb131973490db62f0a1c6b5fde826185fa90be597b1d329a4faeceaf91b39
MD5 38b7dbd38ce13f2d0ca25258a63e126a
BLAKE2b-256 f111a8120cef900cfa1385a847a0cf55d40a9c03e46e2b975c7dfec01c90a12f

See more details on using hashes here.

Provenance

The following attestation bundles were made for nthlayer-0.1.0a20.tar.gz:

Publisher: release.yml on rsionnach/nthlayer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nthlayer-0.1.0a20-py3-none-any.whl.

File metadata

  • Download URL: nthlayer-0.1.0a20-py3-none-any.whl
  • Upload date:
  • Size: 413.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for nthlayer-0.1.0a20-py3-none-any.whl
Algorithm Hash digest
SHA256 2dd56e84b2010e1778ef1e84407025b219e1f43ffce918822d72d15b31fba30b
MD5 03306081cb2ed4f4d243af62ebff549d
BLAKE2b-256 dd7c32d21c3a4d297845175f39473e0ee4eaeccb3e25d86d13ac6bc61e9b605e

See more details on using hashes here.

Provenance

The following attestation bundles were made for nthlayer-0.1.0a20-py3-none-any.whl:

Publisher: release.yml on rsionnach/nthlayer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page