Skip to main content

NthLayer - The Missing Layer of Reliability

Project description

NthLayer

Shift-left reliability for platform teams.

Define reliability requirements as code. Validate SLOs against dependency chains. Detect drift before incidents. Gate deployments on real data.

Status: Alpha PyPI License: MIT Alert Rules

TL;DR

pip install nthlayer
nthlayer check-deploy demo

โš ๏ธ The Problem

Reliability decisions happen too late. Teams set SLOs in isolation, deploy without checking error budgets, and discover missing metrics during incidents. Dashboards are inconsistent. Alerts are copy-pasted. Nobody validates whether a 99.99% target is even achievable given dependencies.

๐Ÿ’ก The Solution

NthLayer moves reliability left:

service.yaml โ†’ validate โ†’ check-deploy โ†’ deploy
                  โ”‚            โ”‚
                  โ”‚            โ””โ”€โ”€ Error budget ok? Drift acceptable?
                  โ”‚
                  โ””โ”€โ”€ SLO feasible? Dependencies support it? Metrics exist?

โšก Core Features

Drift Detection

Predict SLO exhaustion before it happens. Don't wait for the budget to hit zero.

$ nthlayer drift payment-api

payment-api: CRITICAL
  Current: 73.2% budget remaining
  Trend: -2.1%/day (gradual decline)
  Projection: Budget exhausts in 23 days

  Recommendation: Investigate error rate increase before next release

Dependency-Aware SLO Validation

Your SLO ceiling is your weakest dependency chain. NthLayer calculates it.

$ nthlayer validate-slo payment-api

Target: 99.99% availability
Dependencies:
  โ†’ postgresql (99.95%)
  โ†’ redis (99.99%)
  โ†’ user-service (99.9%)

Serial availability: 99.84%
โœ— INFEASIBLE: Target exceeds dependency ceiling by 0.15%

Recommendation: Reduce target to 99.8% or improve user-service SLO

Deployment Gates

Block deploys when error budget is exhausted or drift is critical.

$ nthlayer check-deploy payment-api

ERROR: Deployment blocked
  - Error budget: -47 minutes (exhausted)
  - Drift severity: critical
  - 3 P1 incidents in last 7 days

Exit code: 2 (BLOCKED)

Blast Radius Analysis

Understand impact before making changes.

$ nthlayer blast-radius payment-api

Direct dependents (3):
  โ€ข checkout-service (critical) - 847K req/day
  โ€ข order-service (critical) - 523K req/day
  โ€ข refund-worker (standard) - 12K req/day

Transitive impact: 12 services, 2.1M daily requests
Risk: HIGH - affects checkout flow

Metric Recommendations

Enforce OpenTelemetry conventions. Know what's missing before production.

$ nthlayer recommend-metrics payment-api

Required (SLO-critical):
  โœ“ http.server.request.duration    FOUND
  โœ— http.server.active_requests     MISSING

Run with --show-code for instrumentation examples.

Artifact Generation

Generate dashboards, alerts, and SLOs from a single spec.

$ nthlayer apply service.yaml

Generated:
  โ†’ dashboard.json (Grafana)
  โ†’ alerts.yaml (Prometheus)
  โ†’ recording-rules.yaml (Prometheus)
  โ†’ slos.yaml (OpenSLO)

๐Ÿš€ Quick Start

# Install
pip install nthlayer

# Create a service spec
nthlayer init

# Validate and generate
nthlayer apply service.yaml

# Check deployment readiness
nthlayer check-deploy payment-api

Minimal service.yaml

name: payment-api
tier: critical
type: api
team: payments

dependencies:
  - postgresql
  - redis

NthLayer also supports the OpenSRM format (apiVersion: opensrm/v1) for contracts, deployment gates, and more. See full spec reference for all options.


๐Ÿ”„ CI/CD Integration

# GitHub Actions
- name: Validate reliability
  run: |
    nthlayer validate-slo ${{ matrix.service }}
    nthlayer check-deploy ${{ matrix.service }}

Works with: GitHub Actions, GitLab CI, ArgoCD, Tekton, Jenkins


๐ŸŽฏ How It's Different

Traditional Approach NthLayer
Set SLOs in isolation Validate against dependency chains
Alert when budget exhausted Predict exhaustion with drift detection
Discover missing metrics in incidents Enforce before deployment
Manual dashboard creation Generate from spec
"Is this ready?" = opinion "Is this ready?" = deterministic check

๐Ÿ“š Documentation

Full Documentation - Comprehensive guides and reference.

Ask DeepWiki

Guide Description
Quick Start Get running in 5 minutes
Drift Detection Predict SLO exhaustion
Dependency Discovery Automatic dependency mapping
CI/CD Integration Pipeline setup
CLI Reference All commands

๐Ÿ—บ๏ธ Roadmap

  • Artifact generation (dashboards, alerts, SLOs)
  • Deployment gates (check-deploy)
  • Error budget tracking
  • Portfolio view
  • Drift detection
  • Dependency discovery
  • validate-slo
  • blast-radius
  • Metric recommendations
  • OpenSRM manifest format (srm/v1)
  • Reliability scorecard
  • Monte Carlo SLO simulation (nthlayer simulate)
  • Loki alert generation
  • Recording rules generation
  • Contract & dependency validation
  • Intelligent alerts pipeline
  • Identity resolution & ownership
  • CI/CD GitHub Action
  • Agentic inference (nthlayer infer)
  • MCP server integration
  • Backstage plugin

Agentic Inference (Planned)

nthlayer infer will use a model to analyse a codebase and propose an OpenSRM manifest for it. The model examines the code, identifies services, infers appropriate SLO targets, and generates a draft service.reliability.yaml that NthLayer then validates and generates artifacts from.

This follows Zero Framework Cognition: the model provides judgment (what SLOs does this service need?), and NthLayer provides transport (validate the manifest, generate the monitoring artifacts). Clean boundary between reasoning and deterministic transformation.


OpenSRM Ecosystem

NthLayer is one component in the OpenSRM ecosystem. Each component solves a complete problem independently, and they compose when used together through shared OpenSRM manifests and OTel telemetry conventions.

                        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                        โ”‚     OpenSRM Manifest     โ”‚
                        โ”‚  (the shared contract)   โ”‚
                        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                     โ”‚
                    reads            โ”‚           reads
               โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
               โ–ผ             โ–ผ             โ–ผ             โ–ผ
         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
         โ”‚ Arbiter  โ”‚ โ”‚>NTHLAYER<โ”‚ โ”‚  SitRep  โ”‚ โ”‚  Mayday  โ”‚
         โ”‚          โ”‚ โ”‚          โ”‚ โ”‚          โ”‚ โ”‚          โ”‚
         โ”‚ quality  โ”‚ โ”‚ generate โ”‚ โ”‚correlate โ”‚ โ”‚ incident โ”‚
         โ”‚+govern   โ”‚ โ”‚ monitoringโ”‚ โ”‚ signals  โ”‚ โ”‚ response โ”‚
         โ”‚+cost     โ”‚ โ”‚          โ”‚ โ”‚          โ”‚ โ”‚          โ”‚
         โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜
              โ”‚             โ”‚             โ”‚             โ”‚
              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                   โ–ผ
                     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                     โ”‚      Verdict Store       โ”‚
                     โ”‚  (shared data substrate) โ”‚
                     โ”‚ create ยท resolve ยท link  โ”‚
                     โ”‚ accuracy ยท gaming-check  โ”‚
                     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                  โ”‚ OTel side-effects
                                  โ–ผ
                     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                     โ”‚    OTel Collector /      โ”‚
                     โ”‚   Prometheus / Grafana   โ”‚
                     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

              Learning loop (post-incident):
              Mayday findings โ†’ manifest updates
              โ†’ NthLayer regenerates โ†’ Arbiter
              refines โ†’ SitRep improves โ†’ OpenSRM

How NthLayer fits in:

  • NthLayer reads OpenSRM manifests and generates the monitoring infrastructure (Prometheus rules, Grafana dashboards, PagerDuty config) that the rest of the ecosystem relies on
  • Verdict operations emit OTel side-effects (gen_ai.decision.*, gen_ai.override.*) that flow into Prometheus. NthLayer generates dashboards for these metrics alongside service dashboards โ€” NthLayer reads from Prometheus, not the Verdict Store directly.
  • NthLayer exports service topology that SitRep uses for topology-aware signal correlation
  • Mayday's post-incident findings feed back into NthLayer as rule refinements (alerts that should have fired earlier or didn't fire at all)

Each component works alone. Someone who just needs reliability-as-code adopts NthLayer without needing the Arbiter, SitRep, or Mayday.

Component What it does Link
OpenSRM Specification for declaring service reliability requirements opensrm
Verdict Data primitive for recording AI judgments and measuring correctness verdicts
Arbiter Quality measurement and governance for AI agents arbiter
NthLayer Generate monitoring infrastructure from manifests (this repo) nthlayer
SitRep Situational awareness through signal correlation sitrep
Mayday Multi-agent incident response mayday

๐Ÿค Contributing

# Install uv (https://docs.astral.sh/uv/)
curl -LsSf https://astral.sh/uv/install.sh | sh

git clone https://github.com/rsionnach/nthlayer.git
cd nthlayer
make setup    # Install deps, start services
make test     # Run tests

See CONTRIBUTING.md for details.


๐Ÿ“„ License

MIT - See LICENSE.txt


๐Ÿ™ Acknowledgments

Built on grafana-foundation-sdk, awesome-prometheus-alerts, pint, and OpenSLO. Inspired by Sloth and autograf.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nthlayer-0.1.0a19.tar.gz (724.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nthlayer-0.1.0a19-py3-none-any.whl (560.1 kB view details)

Uploaded Python 3

File details

Details for the file nthlayer-0.1.0a19.tar.gz.

File metadata

  • Download URL: nthlayer-0.1.0a19.tar.gz
  • Upload date:
  • Size: 724.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nthlayer-0.1.0a19.tar.gz
Algorithm Hash digest
SHA256 f82e091e7ae301a0a1787eeb4cb210ce0a459c0ef8cf4afb07b3e1ee14ad9e7e
MD5 4384426c892702a2cee6be4e4234bafb
BLAKE2b-256 37b9592ec286612ee78508fd088ab8ab31658bd42e52b7408c64cd2206762179

See more details on using hashes here.

Provenance

The following attestation bundles were made for nthlayer-0.1.0a19.tar.gz:

Publisher: release.yml on rsionnach/nthlayer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nthlayer-0.1.0a19-py3-none-any.whl.

File metadata

  • Download URL: nthlayer-0.1.0a19-py3-none-any.whl
  • Upload date:
  • Size: 560.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nthlayer-0.1.0a19-py3-none-any.whl
Algorithm Hash digest
SHA256 3befd474320013f8568e8a9925e72ca1679db0ec122e38e8890d99f8f014327a
MD5 f754d6ddd95d2f18a2743360f3df4766
BLAKE2b-256 130e34b530710f22331d7eb781f8633a888dc6a4ad70d9abb69d3ec28da9804e

See more details on using hashes here.

Provenance

The following attestation bundles were made for nthlayer-0.1.0a19-py3-none-any.whl:

Publisher: release.yml on rsionnach/nthlayer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page