Skip to main content

NthLayer - The Missing Layer of Reliability

Project description

NthLayer

Generate your complete reliability stack from a single service spec.

Status: Alpha PyPI License: MIT


⚡ Quick Start

pipx install nthlayer

nthlayer apply service.yaml

# Output: generated/payment-api/
#   ├── dashboard.json       → Grafana
#   ├── alerts.yaml          → Prometheus
#   ├── slos.yaml            → OpenSLO
#   └── recording-rules.yaml → Prometheus

📥 What You Put In

1. Service Spec (service.yaml)

# Minimal example (5 lines)
name: payment-api
tier: critical
type: api
dependencies:
  - postgresql

2. Environment Variables (optional)

# 📟 PagerDuty - auto-create team, escalation policy, service
export PAGERDUTY_API_KEY=...

# 📊 Grafana - auto-push dashboards
export NTHLAYER_GRAFANA_URL=...
export NTHLAYER_GRAFANA_API_KEY=...
export NTHLAYER_GRAFANA_ORG_ID=1              # Default: 1

# 🔍 Prometheus - metric discovery for intent resolution
export NTHLAYER_PROMETHEUS_URL=...
export NTHLAYER_METRICS_USER=...              # If auth required
export NTHLAYER_METRICS_PASSWORD=...

📤 What You Get Out

Output File Deploy To
📊 Dashboard generated/<service>/dashboard.json Grafana
🚨 Alerts generated/<service>/alerts.yaml Prometheus
🎯 SLOs generated/<service>/slos.yaml OpenSLO-compatible
⚡ Recording Rules generated/<service>/recording-rules.yaml Prometheus
📟 PagerDuty Created via API Team, escalation policy, service

📊 SLO Portfolio

Track reliability across your entire organization:

nthlayer portfolio demo
$ nthlayer portfolio

======================================================================
  NthLayer Reliability Portfolio
======================================================================

Overall Health: 78% (14/18 SLOs meeting target)

By Tier:
  Critical: 5/6 healthy (83%)
  Standard: 6/8 healthy (75%)
  Low: 3/4 healthy (75%)

Top Budget Burners:
  payment-api/availability: 12.5h burned (156%)
  search-api/latency: 8.2h burned (95%)

Insights:
  ! payment-api needs reliability investment
  * user-api exceeds SLO - consider tier promotion

----------------------------------------------------------------------
Services: 12 | SLOs: 18
nthlayer slo list              # List all SLOs across services
nthlayer slo show payment-api  # Show SLO details for a service
nthlayer slo collect payment-api  # Query Prometheus for current budget
nthlayer portfolio             # Org-wide reliability view
nthlayer portfolio --details   # Full breakdown by service

📝 Full Service Example

name: payment-api
tier: critical              # critical | standard | low
type: api                   # api | worker | stream
team: payments

slos:
  availability: 99.95       # Generates Prometheus alerts
  latency_p99_ms: 200       # Generates histogram queries

dependencies:
  - postgresql              # Adds PostgreSQL panels
  - redis                   # Adds Redis panels
  - kubernetes              # Adds K8s pod metrics

pagerduty:
  enabled: true
  support_model: self       # self | shared | sre | business_hours

💰 The Value

⏱️ 20 hours → 5 minutes per service

What Gets Automated

Task Manual Effort With NthLayer
🎯 Define SLOs & error budgets 6 hours Generated
🚨 Research & configure alerts 4 hours 400+ battle-tested rules
📊 Build Grafana dashboards 5 hours 12-28 panels auto-generated
📟 PagerDuty escalation setup 2 hours Tier-based defaults
📋 Write recording rules 3 hours 20+ pre-computed metrics
Total per service 20 hours 5 minutes

*Hours based on typical SRE team experience for production-grade setup. Actual times vary by team expertise and existing tooling.

At Scale

Scale Manual Hours With NthLayer Hours Saved Value*
🚀 50 services 1,000 hrs 4 hrs 996 hrs $100K
📈 200 services 4,000 hrs 17 hrs 3,983 hrs $400K
🏢 1,000 services 20,000 hrs 83 hrs 19,917 hrs $2M

*Value calculated at $100/hr engineering cost. Your mileage may vary.


🧠 How It Works

Step What Happens
🔍 Metric Discovery Queries Prometheus to find what metrics actually exist
🎯 Intent Resolution Maps "availability SLO" → best matching PromQL query
🔀 Type Routing API services get HTTP metrics, workers get job metrics
Tier Defaults Critical = 5/15/30min escalation, Low = 60min
🏗️ Technology Templates PostgreSQL, Redis, Kubernetes patterns built-in

🛠️ CLI Commands

nthlayer plan service.yaml      # 👀 Preview what will be generated
nthlayer apply service.yaml     # ✨ Generate all artifacts
nthlayer apply --push-grafana   # 📊 Also push dashboard to Grafana
nthlayer apply --lint           # ✅ Validate generated alerts with pint
nthlayer lint alerts.yaml       # 🔍 Lint existing Prometheus rules

🔮 Coming Soon

Feature Description Status
💰 Error Budgets Track budget consumption, correlate with deploys ✅ Done
📊 SLO Portfolio Org-wide reliability view across all services ✅ Done
📝 Loki Integration Generate LogQL alert rules, technology-specific log patterns 🔨 Next
🚦 Deployment Gates Block ArgoCD deploys when budget exhausted 📋 Planned
🤖 AI Generation Conversational service.yaml creation via MCP 📋 Planned

📦 Installation

# Recommended
pipx install nthlayer

# Or with pip
pip install nthlayer

# Verify
nthlayer --version

🌐 Live Demo

See NthLayer in action with real Grafana dashboards and generated configs:

Live Dashboards Interactive Demo


📚 Documentation

Full Documentation - Comprehensive guides and reference.

Quick Links
🚀 Quick Start Get running in 5 minutes
🔧 Setup Wizard Interactive configuration
📊 SLO Portfolio Org-wide reliability view
🔌 18 Technologies PostgreSQL, Redis, Kafka...
📖 CLI Reference All commands
🤝 Contributing How to contribute
Build docs locally
pip install -e ".[docs]"
mkdocs serve  # Opens at http://localhost:8000

🤝 Contributing

git clone https://github.com/rsionnach/nthlayer.git
cd nthlayer
make setup    # Install deps, start services
make test     # Run tests (84 should pass)

See CONTRIBUTING.md for details.


📄 License

MIT - See LICENSE.txt


🙏 Acknowledgments

Core Dependencies

Architecture Inspiration

  • autograf - Dynamic Prometheus metric discovery
  • Sloth - SLO specification and burn rate calculations
  • OpenSLO - SLO specification standard

Tooling

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nthlayer-0.1.0a6.tar.gz (251.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nthlayer-0.1.0a6-py3-none-any.whl (273.3 kB view details)

Uploaded Python 3

File details

Details for the file nthlayer-0.1.0a6.tar.gz.

File metadata

  • Download URL: nthlayer-0.1.0a6.tar.gz
  • Upload date:
  • Size: 251.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nthlayer-0.1.0a6.tar.gz
Algorithm Hash digest
SHA256 60c32bead0b5cb58a9a24f10abc8fb8d0bb1910b31ffb9cdc6f6107dafd41ba0
MD5 e558d16ca332a17f3c0bd5b9ba54f77d
BLAKE2b-256 b31882bb00375b0bc9399ed1d983b4b28c79a1f449ce7b27bd5f933548eb0dc5

See more details on using hashes here.

Provenance

The following attestation bundles were made for nthlayer-0.1.0a6.tar.gz:

Publisher: release.yml on rsionnach/nthlayer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nthlayer-0.1.0a6-py3-none-any.whl.

File metadata

  • Download URL: nthlayer-0.1.0a6-py3-none-any.whl
  • Upload date:
  • Size: 273.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nthlayer-0.1.0a6-py3-none-any.whl
Algorithm Hash digest
SHA256 e8fb5bd244a7247207b3026198210e183bddfd524d69857b29cd4669755066d2
MD5 f8e62269f983f63b5c9a53b38e95b36f
BLAKE2b-256 f78dbedf29b0062b1067832f3f5f1167bf3533288c10fe73acef94a83ecc4e97

See more details on using hashes here.

Provenance

The following attestation bundles were made for nthlayer-0.1.0a6-py3-none-any.whl:

Publisher: release.yml on rsionnach/nthlayer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page