NthLayer - The Missing Layer of Reliability

Project description

NthLayer

The Missing Layer of Reliability

Reliability requirements as code.

NthLayer lets you define what "production-ready" means for a service, then generates, validates, and enforces those requirements automatically.

Define once. Generate everything. Block bad deploys.

The Problem

For every new service, teams are expected to:

Manually create dashboards
Hand-craft alerts and recording rules
Define SLOs and error budgets
Configure incident escalation
Decide if a service is "ready" for production

These decisions are usually made after deployment, enforced inconsistently, or revisited only during incidents.

The Solution

NthLayer moves reliability left in the delivery lifecycle:

┌─────────────────────────────────────────────────────────────────────────────┐
│ service.yaml → generate → lint → verify → check-deploy → deploy            │
│                   ↓         ↓       ↓           ↓                          │
│               artifacts   valid?  metrics?  budget ok?                     │
│                                                                            │
│ "Is this production-ready?" - answered BEFORE deployment                   │
└─────────────────────────────────────────────────────────────────────────────┘

# In your Tekton/GitHub Actions pipeline:
nthlayer apply service.yaml --lint    # Generate + validate PromQL syntax
nthlayer verify service.yaml          # Verify declared metrics exist
nthlayer check-deploy service.yaml    # Check error budget gate
# Only if all pass: deploy to production

Works with: Tekton, GitHub Actions, GitLab CI, ArgoCD, Mimir/Cortex

🚦 Shift Left Features

Command	What It Does	Pipeline Exit Code
`nthlayer verify`	Validates declared metrics exist in Prometheus	1 if missing metrics
`nthlayer check-deploy`	Checks error budget - blocks if exhausted	2 if budget exhausted
`nthlayer apply --lint`	Validates PromQL syntax with pint	1 if invalid queries

Deployment Gate Example

⚡ Quick Start

pipx install nthlayer

nthlayer apply service.yaml

# Output: generated/payment-api/
#   ├── dashboard.json       → Grafana
#   ├── alerts.yaml          → Prometheus
#   ├── slos.yaml            → OpenSLO
#   └── recording-rules.yaml → Prometheus

What NthLayer Is

A reliability specification that defines production-readiness
A compiler from service intent to operational reality
A CI/CD-native way to standardize reliability across teams

NthLayer integrates with existing tools (Prometheus, Grafana, PagerDuty) but operates before them - deciding what is allowed to reach production.

What NthLayer Is Not

Not a service catalog
Not an observability platform
Not an incident management system
Not a runtime control plane

NthLayer complements these systems by ensuring services meet reliability expectations before they are deployed.

Why NthLayer?

With NthLayer	Without NthLayer
Platform teams encode reliability standards once	Standards recreated per service
Service teams inherit sane defaults automatically	Each team invents their own
"Is this production-ready?" = deterministic check	"Is this ready?" = negotiated opinion
Reliability is enforced by default	Reliability is reactive and inconsistent

📥 What You Put In

1. Service Spec (`service.yaml`)

# Minimal example (5 lines)
name: payment-api
tier: critical
type: api
dependencies:
  - postgresql

2. Environment Variables (optional)

# 📟 PagerDuty - auto-create team, escalation policy, service
export PAGERDUTY_API_KEY=...

# 📊 Grafana - auto-push dashboards
export NTHLAYER_GRAFANA_URL=...
export NTHLAYER_GRAFANA_API_KEY=...
export NTHLAYER_GRAFANA_ORG_ID=1              # Default: 1

# 🔍 Prometheus - metric discovery for intent resolution
export NTHLAYER_PROMETHEUS_URL=...
export NTHLAYER_METRICS_USER=...              # If auth required
export NTHLAYER_METRICS_PASSWORD=...

📤 What You Get Out

Output	File	Deploy To
📊 Dashboard	`generated/<service>/dashboard.json`	Grafana
🚨 Alerts	`generated/<service>/alerts.yaml`	Prometheus
🎯 SLOs	`generated/<service>/slos.yaml`	OpenSLO-compatible
⚡ Recording Rules	`generated/<service>/recording-rules.yaml`	Prometheus
📟 PagerDuty	Created via API	Team, escalation policy, service

📊 SLO Portfolio

Track reliability across your entire organization:

nthlayer portfolio              # Org-wide reliability view
nthlayer portfolio --format json  # Machine-readable for dashboards
nthlayer slo collect service.yaml  # Query current budget from Prometheus

📝 Full Service Example

name: payment-api
tier: critical              # critical | standard | low
type: api                   # api | worker | stream
team: payments

slos:
  availability: 99.95       # Generates Prometheus alerts
  latency_p99_ms: 200       # Generates histogram queries

dependencies:
  - postgresql              # Adds PostgreSQL panels
  - redis                   # Adds Redis panels
  - kubernetes              # Adds K8s pod metrics

pagerduty:
  enabled: true
  support_model: self       # self | shared | sre | business_hours

💰 The Value

Generation: 20 hours → 5 minutes per service

Task	Manual Effort	With NthLayer
🎯 Define SLOs & error budgets	6 hours	Generated from tier
🚨 Research & configure alerts	4 hours	400+ battle-tested rules
📊 Build Grafana dashboards	5 hours	12-28 panels auto-generated
📟 PagerDuty escalation setup	2 hours	Tier-based defaults
📋 Write recording rules	3 hours	20+ pre-computed metrics

Validation: Catch issues before production

Problem	Without NthLayer	With NthLayer
Missing metrics	Discover after deploy	`nthlayer verify` blocks promotion
Invalid PromQL	Prometheus rejects rules	`--lint` catches in CI
Policy violations	Manual review	`nthlayer validate-spec` enforces
Exhausted budget	Deploy anyway, incident	`check-deploy` blocks risky deploys

At Scale

Scale	Generation Saved	Incidents Prevented*
🚀 50 services	996 hours ($100K)	~12/year
📈 200 services	3,983 hours ($400K)	~48/year
🏢 1,000 services	19,917 hours ($2M)	~240/year

_{*Estimated based on 60% reduction in "missing monitoring" incidents. Value at $100/hr engineering cost.}

🧠 How It Works

Generation

Step	What Happens
🎯 Intent Resolution	Maps "availability SLO" → best matching PromQL query
🔀 Type Routing	API services get HTTP metrics, workers get job metrics
⚡ Tier Defaults	Critical = 99.95% SLO + 5min escalation, Low = 99.5% + 60min
🏗️ Technology Templates	23 built-in: PostgreSQL, Redis, Kafka, MongoDB, etc.

CI/CD Pipeline

┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│   Generate  │───▶│   Validate  │───▶│   Protect   │───▶│   Deploy    │
│ nthlayer    │    │ --lint      │    │ check-deploy│    │ kubectl     │
│ apply       │    │ verify      │    │             │    │ argocd      │
└─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘
      │                  │                  │
      ▼                  ▼                  ▼
  artifacts         exit 1 if          exit 2 if
  to git            invalid            budget exhausted

Works with: GitHub Actions, GitLab CI, ArgoCD, Tekton, Jenkins

🛠️ CLI Commands

Generate

nthlayer init                   # Interactive service.yaml creation
nthlayer plan service.yaml      # Preview what will be generated
nthlayer apply service.yaml     # Generate all artifacts
nthlayer apply --push           # Also push dashboard to Grafana
nthlayer apply --push-ruler     # Push alerts to Mimir/Cortex Ruler API

Validate

nthlayer apply --lint           # Validate PromQL syntax (pint)
nthlayer validate-spec service.yaml  # Check against policies (OPA/Rego)
nthlayer verify service.yaml    # Verify metrics exist in Prometheus

Protect

nthlayer check-deploy service.yaml  # Check error budget gate (exit 2 = blocked)
nthlayer portfolio              # Org-wide SLO health
nthlayer slo collect service.yaml   # Query current budget from Prometheus

🔮 Coming Soon

Feature	Description	Status
💰 Error Budgets	Track budget consumption, correlate with deploys	✅ Done
📊 SLO Portfolio	Org-wide reliability view across all services	✅ Done
🚦 Deployment Gates	Block deploys when error budget exhausted	✅ Done
✅ Contract Verification	Verify declared metrics exist before promotion	✅ Done
📝 Loki Integration	Generate LogQL alert rules, technology-specific log patterns	🔨 Next
🤖 AI Generation	Conversational service.yaml creation via MCP	📋 Planned

📦 Installation

# Recommended
pipx install nthlayer

# Or with pip
pip install nthlayer

# Verify
nthlayer --version

🌐 Live Demo

See NthLayer in action with real Grafana dashboards and generated configs:

📚 Documentation

Full Documentation - Comprehensive guides and reference.

Quick Links
🚀 Quick Start	Get running in 5 minutes
🔧 Setup Wizard	Interactive configuration
📊 SLO Portfolio	Org-wide reliability view
🔌 18 Technologies	PostgreSQL, Redis, Kafka...
📖 CLI Reference	All commands
🤝 Contributing	How to contribute

Build docs locally

uv sync --extra docs
uv run mkdocs serve  # Opens at http://localhost:8000

🤝 Contributing

# Install uv (https://docs.astral.sh/uv/)
curl -LsSf https://astral.sh/uv/install.sh | sh

git clone https://github.com/rsionnach/nthlayer.git
cd nthlayer
make setup    # Install deps, start services
make test     # Run tests

See CONTRIBUTING.md for details.

📄 License

MIT - See LICENSE.txt

🙏 Acknowledgments

Core Dependencies

grafana-foundation-sdk - Dashboard generation SDK (Apache 2.0)
awesome-prometheus-alerts - 580+ battle-tested alert rules (CC BY 4.0)
pint - PromQL linting and validation (Apache 2.0)
conftest / OPA - Policy validation (Apache 2.0)
PagerDuty Python SDK - Incident management integration (MIT)

Architecture Inspiration

autograf - Dynamic Prometheus metric discovery
Sloth - SLO specification and burn rate calculations
OpenSLO - SLO specification standard

CLI & Documentation

Rich - Terminal formatting and styling (MIT)
Questionary - Interactive CLI prompts (MIT)
MkDocs Material - Documentation theme (MIT)
VHS - Terminal demo recordings (MIT)
Nord Theme - Color palette inspiration (MIT)

Tooling

Shields.io - Badges
Slidev - Presentation framework

Project details

Release history Release notifications | RSS feed

1.0.0

Apr 28, 2026

0.1.0a20 pre-release

Apr 12, 2026

0.1.0a19 pre-release

Mar 16, 2026

0.1.0a18 pre-release

Mar 6, 2026

0.1.0a17 pre-release

Feb 27, 2026

0.1.0a16 pre-release

Feb 4, 2026

0.1.0a15 pre-release

Jan 30, 2026

0.1.0a14 pre-release

Jan 18, 2026

0.1.0a13 pre-release

Jan 14, 2026

0.1.0a12 pre-release

Jan 12, 2026

0.1.0a11 pre-release

Jan 9, 2026

This version

0.1.0a10 pre-release

Jan 8, 2026

0.1.0a9 pre-release

Jan 4, 2026

0.1.0a8 pre-release

Dec 21, 2025

0.1.0a7 pre-release

Dec 12, 2025

0.1.0a6 pre-release

Dec 8, 2025

0.1.0a5 pre-release

Dec 7, 2025

0.1.0a4 pre-release

Dec 7, 2025

0.1.0a3 pre-release

Dec 7, 2025

0.1.0a2 pre-release

Dec 5, 2025

0.1.0a1 pre-release

Dec 3, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nthlayer-0.1.0a10.tar.gz (439.1 kB view details)

Uploaded Jan 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nthlayer-0.1.0a10-py3-none-any.whl (324.1 kB view details)

Uploaded Jan 8, 2026 Python 3

File details

Details for the file nthlayer-0.1.0a10.tar.gz.

File metadata

Download URL: nthlayer-0.1.0a10.tar.gz
Upload date: Jan 8, 2026
Size: 439.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nthlayer-0.1.0a10.tar.gz
Algorithm	Hash digest
SHA256	`74f18893e7f364f4c09c3ecfc3ddd50062dc0cddb44345e19f56d626fc00ce74`
MD5	`eedb88e9520a6dc6788b2b68d52254f5`
BLAKE2b-256	`a55c095e6df16931b4cc3c02dc77209057de8f3fa72d92b41721739c21e22428`

See more details on using hashes here.

Provenance

The following attestation bundles were made for nthlayer-0.1.0a10.tar.gz:

Publisher: release.yml on rsionnach/nthlayer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: nthlayer-0.1.0a10.tar.gz
- Subject digest: 74f18893e7f364f4c09c3ecfc3ddd50062dc0cddb44345e19f56d626fc00ce74
- Sigstore transparency entry: 805353576
- Sigstore integration time: Jan 8, 2026
Source repository:
- Permalink: rsionnach/nthlayer@ae0470ff694f46899e0433cfe46d3c5e9e6cc310
- Branch / Tag: refs/tags/v0.1.0a10
- Owner: https://github.com/rsionnach
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@ae0470ff694f46899e0433cfe46d3c5e9e6cc310
- Trigger Event: release

File details

Details for the file nthlayer-0.1.0a10-py3-none-any.whl.

File metadata

Download URL: nthlayer-0.1.0a10-py3-none-any.whl
Upload date: Jan 8, 2026
Size: 324.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nthlayer-0.1.0a10-py3-none-any.whl
Algorithm	Hash digest
SHA256	`204240671cc2c480f8eaf559362638fb64402cb8ac763f19e70f030e2f5c1054`
MD5	`4e04c6d8b00b3788a77342b5f78992a7`
BLAKE2b-256	`f71f206849f63d2fcc8a3bbac4d6b0b75ef356e1677d99da8624de8cf5fdfa2c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for nthlayer-0.1.0a10-py3-none-any.whl:

Publisher: release.yml on rsionnach/nthlayer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: nthlayer-0.1.0a10-py3-none-any.whl
- Subject digest: 204240671cc2c480f8eaf559362638fb64402cb8ac763f19e70f030e2f5c1054
- Sigstore transparency entry: 805353577
- Sigstore integration time: Jan 8, 2026
Source repository:
- Permalink: rsionnach/nthlayer@ae0470ff694f46899e0433cfe46d3c5e9e6cc310
- Branch / Tag: refs/tags/v0.1.0a10
- Owner: https://github.com/rsionnach
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@ae0470ff694f46899e0433cfe46d3c5e9e6cc310
- Trigger Event: release

nthlayer 0.1.0a10

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

NthLayer

The Missing Layer of Reliability

The Problem

The Solution

🚦 Shift Left Features

Deployment Gate Example

⚡ Quick Start

What NthLayer Is

What NthLayer Is Not

Why NthLayer?

📥 What You Put In

1. Service Spec (service.yaml)

2. Environment Variables (optional)

📤 What You Get Out

📊 SLO Portfolio

📝 Full Service Example

💰 The Value

Generation: 20 hours → 5 minutes per service

Validation: Catch issues before production

At Scale

🧠 How It Works

Generation

CI/CD Pipeline

🛠️ CLI Commands

Generate

Validate

Protect

🔮 Coming Soon

📦 Installation

🌐 Live Demo

📚 Documentation

🤝 Contributing

📄 License

🙏 Acknowledgments

Core Dependencies

Architecture Inspiration

CLI & Documentation

Tooling

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

1. Service Spec (`service.yaml`)