Stateful monitoring: reconcile declared infrastructure state against reality and reason about the drift.

These details have not been verified by PyPI

Project description

steadystate.ai

The operational substrate for IT-Ops — whether a human or an agent drives it. steadystate watches your deployed infrastructure (live, and in CI), tells you whether it's actually working, carries your team's runbook, and closes the loop — but only ever within a bound you commit. A deterministic, stdlib-only core; an optional LLM that advises and proposes but never decides.

The job isn't "find drift." It's the two things an operator — or an agent acting as one — actually needs: grounded truth (what's declared, what's observed, is it working, what changed) and a governed way to act (a vetted catalog, an impact×reversibility bound, approval, an immutable audit). Rent your monitoring for the metrics; steadystate is the layer that knows your desired state and can safely return you to it.

Install

pipx install git+https://github.com/jedi12many/steadystate.ai   # today (pre-release)
pip install steadystate                                          # once published to PyPI

steadystate is a CLI you run from inside your IaC repo — its config, runbook, and state are read CWD-relative, so the repo never imports it as a library. --silo <name> chdirs into a registered deployment (like git -C); the container image lives under deploy/.

Two postures, one core

steadystate runs in two shapes that share the same deterministic engine and the same committed runbook:

	Live watcher	Repo-native (GitOps)
Runs	a long-running server/CLI next to a deployment	stateless, in CI, inside the IaC repo
Holds	creds, a kubeconfig, a state db	nothing but the repo + a token
Acts by	guardrailed remediation on live infra (when you grant it)	opening a PR / an issue — a human merges
Driven by	you, or an agent over MCP	one CI line: `steadystate ci`

Post-deploy and pre-deploy, the same tool. The PR-bot posture is the safest actuator there is — it has zero infra access; its only power is a proposal you review. See docs/repo-native-posture.md.

Is it working? — the verdict, function-first

Running ≠ working. steadystate leads with the question an operator actually asks — is it working? — and answers WORKING | DEGRADED | DOWN:

Smoke tests — the strongest signal is to exercise the service: an http check GETs an endpoint and asserts the response. A service that won't answer is the service being down.
Live malfunction (Symptom) vs drift — a CrashLoopBackOff is the app failing now; a diverged config is drift. steadystate keeps them distinct, and when a failing resource also drifted, folds them into one root-caused alert ("failing — likely cause: this drift").
Function-first triage — summary leads with what's impaired (a live malfunction worth attention), not a wall of findings, so neither you nor an agent chases a red herring. A serious config drift (an opened firewall) is still flagged for review — never buried, never called a malfunction.
Correlation + enrichment — scoped to a workload, it correlates the smoke result, the live symptoms, and the drift that likely caused them, and folds in live metrics from your monitoring (Prometheus) as context. It rents the metrics; it never reimplements monitoring.

Your runbook — author a fix once, then it's everywhere

The other half of grounded truth is what to do about it. A solution is an operator-vouched problem → fix — "for evicted pods, run this; for a hung gateway, reboot it" — committed to steadystate/solutions.json, the catalog you grow over time:

Authored or learned. Write one (add-solution), describe it in plain English (define-solution), or let learn notice a fix you keep applying by hand and hand you the exact command to capture it. Each is signed by an author — the audit anchor.
Matched + surfaced. When a finding matches (by category or a title regex), show names the documented fix and who vouched, a CI-opened issue carries it, and an agent over MCP sees the same — your runbook, right where the problem is.
Run through the gate. A matched fix becomes a one-approve remediation, run as an argv (no shell) and audited with author + approver. Opt in to auto-apply (STEADYSTATE_SOLUTION_AUTO) and a low-impact, reversible one runs unattended — anything bigger always waits for a human.

Act — within a bound you commit

Acting is gated identically everywhere — terminal, chat, agent, CI:

A vetted catalog — the only commands it can run are a fixed menu of safe shapes, re-validated at run time against an injection-proof allow-pattern; an authored solution is your extension of that catalog, vouched and audited.
The bound — every action carries an envelope (impact × reversibility). The bound is the one decision that should never be casual, so you commit it in steadystate/config.toml's [bound] table — reviewed in a PR, not a loose env var. A lossless, tenant-scoped fix runs in bound; scaling to zero or deleting a node escalates to a human.
Approval + an immutable audit — nothing effectful runs unseen; every approve / decline / auto / break-glass appends to history.
Autonomy is a switch, not a track record — off by default, granted by you (STEADYSTATE_*_AUTO), bounded by the envelope above.

The LLM proposes what; a deterministic gate decides whether. The full control model — and, just as honestly, where the guarantee ends (a shell-enabled agent's real limit is its RBAC, not us) — is LLM_SAFETY.md. Ask the tool itself with steadystate posture.

Drive it — terminal, chat, agents, CI

The same vetted command grammar, four ways in:

Terminal — steadystate health for the working/degraded/down verdict; summary for the one-glance rollup; findings / show to inspect; chat for a local REPL.
Chat (Slack / Teams / Discord) — signed webhooks; @steadystate probe <target>, approve from a button. With an LLM, plain English works ("why is web crashlooping?") — a read-only ask runs, an effectful one is echoed back to confirm. Chat is a trigger, never a bypass.
Agents over MCP — steadystate mcp runs as a Model Context Protocol server (stdio, stdlib-only) so Claude Code/Desktop or any agent drives the same verbs through the same guardrails. Three grant tiers: read-only (default) → --author (write checks + runbook solutions, not infra) → --write (remediate). Make it an agent's sole actuator — no shell, steadystate holds the creds — and the gate becomes a real fence (see the contained-agent example).
CI — steadystate ci: stateless, deterministic, no creds; scan the IaC, gate the merge (non-zero on a problem), and open a PR/issue that already says how to fix it.

Detect — the grounded truth it's built on

Everything here runs with no model: fully deterministic, fully testable. It rides each tool's own machine-readable output (terraform show -json, kubectl, helm, …), never raw-file parsing.

Sources (--source) — terraform · terraform-state · ansible · kubernetes · rancher · argocd · docker-compose · helm, plus live variants. terraform-state diffs config-vs-state with -refresh=false — no per-resource cloud refresh, so a CI gate needs only state-bucket read, not broad cloud creds.
Drift vs malfunction (--probe) — config diverged vs failing-right-now, folded when both.
Custom checks — declare what healthy means for your app (is postfix routing mail? is squid up?) as a vetted, read-only rule that emits a finding and never runs code; author by talking (define-check) or let an agent fill the schema (add-check).
Domain packs + live compliance — security (AWS · GCP · Azure → ATT&CK), Docker CIS, k8s Pod Security; a CIS/STIG live-posture scan, honest about what's checkable live.
Surfaces (--to) — console · slack · teams · discord · github · servicenow · pagerduty · prometheus · grafana · webhook. github opens an issue when sure (deduped by fingerprint, auto-closed when it clears, and carrying the matched runbook fix); an unconfigured surface says so and skips.
Silos — name your deployments (silo add, --silo <name> works like git -C) so a laptop drives deployment 1, 2, 3 — each its own db + targets + kubeconfig — without collision.
Extensible — sources, packs, surfaces, probes, metric adapters are entry-point plugins; a third-party package adds one without a fork. Self-describing: catalog / commands.

Config as code

The same convention all the way down: committed beside your IaC, reviewed in PRs.

your-iac-repo/
├── main.tf, ...
├── steadystate/                  # COMMITTED intent
│   ├── config.toml               # [defaults] source/path · [bound] the envelope · [ci] the gate
│   ├── solutions.json            # your runbook (problem → fix)
│   └── checks.json               # what "healthy" means for your app
└── .steadystate/                 # gitignored ephemeral state (state.db, patches)

Precedence is 12-factor and non-breaking: flag > env var > config > built-in default. Every variable is in CONFIG.md; steadystate doctor shows what's set and each dial's live value.

The optional LLM — advises, never decides

An LLM adds the plain-language "why this matters", groups events by root cause, answers questions in chat, drafts a check/solution from your words, and — where you grant it — proposes a remediation or drives the tool as an agent. But detection, scoring, correlation-fallback, and the apply decision stay deterministic, and a proposed action runs only if the gate authorizes it.

Providers — Anthropic (ANTHROPIC_API_KEY) or any OpenAI-compatible endpoint.
Kill switch + egress gate — --no-llm makes zero calls; --confirm-llm shows the exact prompt
- destination and asks before anything is sent (decline → degrades to deterministic).
Cost — every scan prints LLM: N calls · ~$X; steadystate cost rolls it up; surface steadystate_llm_cost_usd_total to Prometheus.

Honest about what it is

The pieces aren't all novel — CI drift detection exists (Spacelift, env0, Terraform Cloud, driftctl) and IaC PR-bots exist (Atlantis). What's different is the combination: a committed, matched runbook (your problem→fix knowledge, not just a diff); a function-first verdict with the bound (is it working, and is this safe to auto-fix?); one substrate across both postures sharing that runbook; and the PR-bot as a deliberately zero-access actuator. A deployment model and a coherence, more than a single unique feature — and the lowest-friction front door to all of it.

Pointers

CONFIG.md — every variable + the committed config.toml.
LLM_SAFETY.md — the control model, and where the guarantee ends.
docs/repo-native-posture.md — the GitOps posture, end to end.
ARCHITECTURE.md — the state model, the seams, build-vs-rent.
examples/ — worked scenarios: repo-native CI, custom checks, the runbook, a contained agent, brokered creds, fleet health, an MCP-driven wall.
SECURITY.md — scope + how to report a vulnerability.

Built with

Python, stdlib-only at the core (HTTP/LLM via urllib; typer + rich for the CLI). Apache-2.0 — see LICENSE.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Jun 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

steadystate-0.1.0.tar.gz (625.4 kB view details)

Uploaded Jun 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

steadystate-0.1.0-py3-none-any.whl (371.2 kB view details)

Uploaded Jun 7, 2026 Python 3

File details

Details for the file steadystate-0.1.0.tar.gz.

File metadata

Download URL: steadystate-0.1.0.tar.gz
Upload date: Jun 7, 2026
Size: 625.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for steadystate-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`43aeb0289200fd0ac21f2f5c7e5fac9d28aa26c1abb30366e3f89365926346dc`
MD5	`10837d2405e67819b08ec24375ffa1a0`
BLAKE2b-256	`5a95ac7bfea8dcb06823e490db60c2f85965237d6930a261ab509b404fea6a87`

See more details on using hashes here.

Provenance

The following attestation bundles were made for steadystate-0.1.0.tar.gz:

Publisher: release.yml on jedi12many/steadystate.ai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: steadystate-0.1.0.tar.gz
- Subject digest: 43aeb0289200fd0ac21f2f5c7e5fac9d28aa26c1abb30366e3f89365926346dc
- Sigstore transparency entry: 1752830076
- Sigstore integration time: Jun 7, 2026
Source repository:
- Permalink: jedi12many/steadystate.ai@897704f4b0b4a73b3227d895d3d65dc83af95614
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/jedi12many
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@897704f4b0b4a73b3227d895d3d65dc83af95614
- Trigger Event: push

File details

Details for the file steadystate-0.1.0-py3-none-any.whl.

File metadata

Download URL: steadystate-0.1.0-py3-none-any.whl
Upload date: Jun 7, 2026
Size: 371.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for steadystate-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b2ce9baeaf4b0793ec916c25a1a4619eb13d2ea68cb9503358a72f93ad753749`
MD5	`737d024b37445e4bee4ccd1f90e049aa`
BLAKE2b-256	`6f00637b006a3f9708e7c7c6802ac8043638b105e2bacff38a7820558fa83d9b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for steadystate-0.1.0-py3-none-any.whl:

Publisher: release.yml on jedi12many/steadystate.ai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: steadystate-0.1.0-py3-none-any.whl
- Subject digest: b2ce9baeaf4b0793ec916c25a1a4619eb13d2ea68cb9503358a72f93ad753749
- Sigstore transparency entry: 1752830222
- Sigstore integration time: Jun 7, 2026
Source repository:
- Permalink: jedi12many/steadystate.ai@897704f4b0b4a73b3227d895d3d65dc83af95614
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/jedi12many
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@897704f4b0b4a73b3227d895d3d65dc83af95614
- Trigger Event: push

steadystate 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

steadystate.ai

Install

Two postures, one core

Is it working? — the verdict, function-first

Your runbook — author a fix once, then it's everywhere

Act — within a bound you commit

Drive it — terminal, chat, agents, CI

Detect — the grounded truth it's built on

Config as code

The optional LLM — advises, never decides

Honest about what it is

Pointers

Built with

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance