Skip to main content

NthLayer - The Missing Layer of Reliability

Project description

NthLayer

Reliability as code. Pure compiler.

Define reliability requirements in a manifest. Generate dashboards, alerts, SLOs, and documentation โ€” deterministically, every time.

Status: Alpha PyPI License: MIT Alert Rules

TL;DR

pip install nthlayer
nthlayer init
nthlayer apply service.yaml

โš ๏ธ The Problem

Reliability decisions happen too late. Teams set SLOs in isolation, deploy without checking error budgets, and discover missing metrics during incidents. Dashboards are inconsistent. Alerts are copy-pasted. Nobody validates whether a 99.99% target is even achievable given dependencies.

๐Ÿ’ก The Solution

NthLayer is a pure compiler for reliability infrastructure. Write a manifest, get artifacts:

service.yaml โ†’ validate โ†’ apply
                  โ”‚          โ”‚
                  โ”‚          โ””โ”€โ”€ Grafana dashboards, Prometheus alerts,
                  โ”‚              recording rules, SLOs, PagerDuty config,
                  โ”‚              Backstage entities, service docs
                  โ”‚
                  โ””โ”€โ”€ SLO feasible? Dependencies support it? Metrics exist?
                      Policies pass? Ceiling valid?

NthLayer generates. The nthlayer-workers runtime (Tier 2) enforces, observes, and responds at runtime, with state held in nthlayer-core (Tier 1) and operator interaction via nthlayer-bench (Tier 3).


โšก Core Features

Artifact Generation

Generate dashboards, alerts, and SLOs from a single spec.

$ nthlayer apply service.yaml

Generated:
  โ†’ dashboard.json (Grafana)
  โ†’ alerts.yaml (Prometheus)
  โ†’ recording-rules.yaml (Prometheus)
  โ†’ slos.yaml (OpenSLO)
  โ†’ backstage.json (Backstage entity)

Dependency-Aware SLO Validation

Your SLO ceiling is your weakest dependency chain. NthLayer calculates it.

$ nthlayer validate-slo payment-api

Target: 99.99% availability
Dependencies:
  โ†’ postgresql (99.95%)
  โ†’ redis (99.99%)
  โ†’ user-service (99.9%)

Serial availability: 99.84%
โœ— INFEASIBLE: Target exceeds dependency ceiling by 0.15%

Recommendation: Reduce target to 99.8% or improve user-service SLO

Metric Recommendations

Enforce OpenTelemetry conventions. Know what's missing before production.

$ nthlayer recommend-metrics payment-api

Required (SLO-critical):
  โœ“ http.server.request.duration    FOUND
  โœ— http.server.active_requests     MISSING

Run with --show-code for instrumentation examples.

Monte Carlo SLO Simulation

Model failure scenarios before they happen.

$ nthlayer simulate service.yaml --scenarios 10000

Monte Carlo Simulation (10,000 runs)
  SLO: availability โ‰ฅ 99.9%
  Result: 94.2% of scenarios meet target
  P50 availability: 99.95%
  P99 availability: 99.82%
  Risk: 5.8% chance of SLO breach in 30d window

Topology Export

Export dependency graphs for correlation engines.

$ nthlayer topology export service.yaml --format json
$ nthlayer topology export service.yaml --format mermaid
$ nthlayer topology export service.yaml --format dot

Policy Validation

Enforce organizational standards at build time.

$ nthlayer validate service.yaml --policies policies.yaml

โœ“ required_fields: ownership.runbook present
โœ— tier_constraint: critical services require deployment gates
โœ“ dependency_rule: all critical deps have SLOs

๐Ÿš€ Quick Start

# Install
pip install nthlayer

# Create a service spec
nthlayer init

# Validate and generate
nthlayer apply service.yaml

Minimal service.yaml

name: payment-api
tier: critical
type: api
team: payments

dependencies:
  - postgresql
  - redis

NthLayer also supports the OpenSRM format (apiVersion: opensrm/v1) for contracts, deployment gates, and more. See full spec reference for all options.


๐Ÿ”„ CI/CD Integration

# GitHub Actions
- name: Validate reliability
  run: |
    nthlayer validate service.yaml
    nthlayer validate-slo service.yaml
    nthlayer apply service.yaml --output-dir generated/

For runtime enforcement (deployment gates, drift detection, error budget checks), use nthlayer-workers โ€” the runtime tier:

- name: Gate deployment
  run: |
    nthlayer-workers gate --service payment-api

The runtime tier reads SLOs and dependency declarations from the same OpenSRM manifests this generator consumes. Verdicts and assessments flow through nthlayer-core's HTTP API.

Works with: GitHub Actions, GitLab CI, ArgoCD, Tekton, Jenkins


๐ŸŽฏ How It's Different

Traditional Approach NthLayer
Set SLOs in isolation Validate against dependency chains
Manual dashboard creation Generate from spec
Copy-paste alerts 593+ alert templates, auto-selected
Discover missing metrics in incidents Enforce before deployment
"Is this ready?" = opinion "Is this ready?" = deterministic check

๐Ÿ“š Documentation

Full Documentation - Comprehensive guides and reference.

Ask DeepWiki

Guide Description
Quick Start Get running in 5 minutes
Dependency Discovery Automatic dependency mapping
CI/CD Integration Pipeline setup
CLI Reference All commands

๐Ÿ—บ๏ธ Roadmap

Generate (this repo)

  • Artifact generation (dashboards, alerts, SLOs, recording rules, Loki alerts)
  • Dependency-aware SLO validation
  • Metric recommendations (OpenTelemetry conventions)
  • Monte Carlo SLO simulation
  • Policy validation (build-time)
  • Topology export (JSON, Mermaid, DOT)
  • OpenSRM manifest format (opensrm/v1)
  • Identity resolution & ownership
  • Backstage entity generation
  • Service documentation generation
  • CI/CD GitHub Action
  • Agentic inference (nthlayer infer)
  • MCP server integration
  • Backstage plugin

Runtime tier (nthlayer-workers)

What was previously the standalone nthlayer-observe repo plus four agentic components is now consolidated into a single Tier-2 worker process with five modules:

  • observe โ€” SLO collection, drift detection, dependency/topology discovery, deploy gate
  • measure โ€” judgment SLO evaluation, governance ratchet, autonomy-level reduction
  • correlate โ€” session-window event correlation, topology drift, contract divergence
  • respond โ€” incident response coordinator (situation-shaped triggers, capture-at-write-time escalation)
  • learn โ€” outcome resolution, calibration signals, retrospective generation

Backed by nthlayer-core (Tier 1: HTTP API, verdict store, case management, manifest catalogue) and operated via nthlayer-bench (Tier 3: Textual TUI for SREs).


Agentic Inference (Planned)

nthlayer infer will use a model to analyse a codebase and propose an OpenSRM manifest for it. The model examines the code, identifies services, infers appropriate SLO targets, and generates a draft service.reliability.yaml that NthLayer then validates and generates artifacts from.

This follows the Zero Framework Cognition boundary applied across the OpenSRM ecosystem: the model provides judgment (what SLOs does this service need?), and NthLayer provides transport (validate the manifest, generate the monitoring artifacts). Clean boundary between reasoning and deterministic transformation. Architectural context: opensrm/docs/superpowers/.


OpenSRM Ecosystem

NthLayer is one piece of a six-repo ecosystem. The architecture has three runtime tiers; this repo (nthlayer-generate) sits outside the runtime tiers as a build-time compiler, feeding manifests forward.

                  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                  โ”‚      OpenSRM Manifest    โ”‚
                  โ”‚  (the shared contract)   โ”‚
                  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                               โ”‚
              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
              โ–ผ                                 โ–ผ
    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”               โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚ nthlayer-generateโ”‚               โ”‚ nthlayer-core   โ”‚
    โ”‚  (build-time)    โ”‚               โ”‚  (Tier 1)       โ”‚
    โ”‚                  โ”‚               โ”‚ HTTP API ยท      โ”‚
    โ”‚ specs โ†’ Grafana, โ”‚               โ”‚ verdict store ยท โ”‚
    โ”‚ Prometheus, SLOs,โ”‚               โ”‚ case mgmt ยท     โ”‚
    โ”‚ Backstage, docs  โ”‚               โ”‚ manifests       โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜               โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ฒโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
             โ”‚                                  โ”‚ HTTP only
             โ”‚ deployed                โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
             โ–ผ                         โ”‚                       โ”‚
    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚  Live infra      โ”‚      โ”‚ nthlayer-workersโ”‚    โ”‚ nthlayer-bench   โ”‚
    โ”‚  (Prometheus,    โ”‚ obs  โ”‚   (Tier 2)      โ”‚    โ”‚   (Tier 3)       โ”‚
    โ”‚   Grafana, etc.) โ”‚ โ”€โ”€โ”€โ”€โ”€โ”‚                 โ”‚    โ”‚ Textual TUI for  โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜      โ”‚ observeยทmeasure โ”‚    โ”‚ SREs: situation  โ”‚
                              โ”‚ correlateยทrespondโ”‚    โ”‚ board, case      โ”‚
                              โ”‚ ยทlearn          โ”‚    โ”‚ bench, approvals โ”‚
                              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

    Learning loop:
    workers.learn retrospectives โ†’ manifest updates โ†’ nthlayer-generate
    regenerates โ†’ workers refine thresholds โ†’ operators ratify in bench

How nthlayer-generate fits in:

  • Reads OpenSRM manifests and emits the monitoring infrastructure (Prometheus rules, Grafana dashboards, recording rules, Backstage entities, service docs) that the runtime tier and live observability stack rely on
  • Pure compiler โ€” deterministic, stateless, no LLM, no runtime side effects
  • Verdicts and assessments produced by nthlayer-workers modules emit OTel side-effects (gen_ai.decision.*, gen_ai.override.*) that flow into Prometheus; this generator can be configured to produce dashboards for those metrics alongside service dashboards
  • Exports service topology that workers.correlate uses for topology-aware signal correlation
  • Post-incident retrospectives produced by workers.learn feed back into manifest updates that regenerate via this compiler โ€” closing the loop

Each component works alone. Someone who just needs reliability-as-code adopts nthlayer-generate without needing the rest of the ecosystem.

Repo Role
opensrm The OpenSRM specification โ€” the manifest format and language for declaring reliability
nthlayer Project front door โ€” documentation hub, GitHub Action delegating to this repo, docs site
nthlayer-common Shared library: verdict model, manifest parser, LLM wrapper, error hierarchy, CoreAPIClient
nthlayer-generate The deterministic compiler (this repo) โ€” specs to artefacts
nthlayer-core Tier 1 โ€” HTTP API server, verdict store, case management, manifest catalogue (pip install nthlayer)
nthlayer-workers Tier 2 โ€” five worker modules: observe, measure, correlate, respond, learn
nthlayer-bench Tier 3 โ€” Textual TUI for SREs

๐Ÿค Contributing

# Install uv (https://docs.astral.sh/uv/)
curl -LsSf https://astral.sh/uv/install.sh | sh

git clone https://github.com/rsionnach/nthlayer-generate.git
cd nthlayer-generate
make setup    # Install deps, start services
make test     # Run tests

See CONTRIBUTING.md for details.


๐Ÿ“„ License

MIT - See LICENSE.txt


๐Ÿ™ Acknowledgments

Built on grafana-foundation-sdk, awesome-prometheus-alerts, pint, and OpenSLO. Inspired by Sloth and autograf.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nthlayer_generate-1.0.0.tar.gz (541.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nthlayer_generate-1.0.0-py3-none-any.whl (418.9 kB view details)

Uploaded Python 3

File details

Details for the file nthlayer_generate-1.0.0.tar.gz.

File metadata

  • Download URL: nthlayer_generate-1.0.0.tar.gz
  • Upload date:
  • Size: 541.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for nthlayer_generate-1.0.0.tar.gz
Algorithm Hash digest
SHA256 1f8f34a182b92b8f24dc41449e2cf8c73e2c8b83fe1d290382164538667b7f41
MD5 3d74773b8178e10a370d9d3181d3300c
BLAKE2b-256 4030d77421fa413add0452f809fa39acb35ad2fa3d2f9d2eeeead3dcc150c780

See more details on using hashes here.

Provenance

The following attestation bundles were made for nthlayer_generate-1.0.0.tar.gz:

Publisher: release.yml on rsionnach/nthlayer-generate

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nthlayer_generate-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for nthlayer_generate-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 49242e506536b07953ce9faeb447400de9cce65a000cc766fdd752f2e124f556
MD5 3ba536f73917942c986ba4580f20ff78
BLAKE2b-256 dd5598304a09bdfeec69f67e957b2be5110a01ef4b05e57592960afef9444b61

See more details on using hashes here.

Provenance

The following attestation bundles were made for nthlayer_generate-1.0.0-py3-none-any.whl:

Publisher: release.yml on rsionnach/nthlayer-generate

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page