Self-hosted SRE investigation copilot with YAML tools, SSH execution, SSE streaming, and secret redaction.
Project description
ops-copilot
Self-hosted SRE investigation copilot for production systems.
ops-copilot lets an LLM call tools defined in YAML, execute safe remote commands over SSH, redact secrets from outputs, and stream investigation events through LangGraph or an optional FastAPI SSE server.
Who this is for
- SREs and platform engineers running self-hosted infrastructure.
- Open source maintainers operating docs, bots, CI runners, demos, or package services.
- Teams that want reviewed operational tools instead of free-form shell access.
- Developers building incident-investigation UIs around LangGraph or LangChain.
Maintenance workflows
This repository is maintained with CI, build checks, smoke tests, release workflows, Dependabot, issue templates, PR checklists, a security model, and PyPI releases.
Typical maintainer tasks include reviewing YAML tools, triaging operational edge cases, adding tests for sanitizer and command-rendering behavior, and preparing safe releases.
Architecture
User question -> InvestigationGraph -> LLM -> YAML tools -> SSH host
<- redacted tool output <- command result
The package is intentionally generic. You can start with shell tools from YAML, then inject custom Python RemoteTool classes for richer workflows.
Install
uv add ops-copilot
Optional extras:
uv add 'ops-copilot[server]'
uv add 'ops-copilot[openai]'
uv add 'ops-copilot[ollama]'
YAML tools
tools:
- name: disk_usage
type: shell
description: Show filesystem usage.
command: df -h
- name: journalctl_service
type: shell
description: Show recent logs for a systemd service.
command: journalctl -u {service} --since '{since}' --no-pager
parameters:
service:
type: string
since:
type: string
required: false
default: "30 minutes ago"
Minimal usage
from ops_copilot import InvestigationGraph, SSHClient, ToolRegistry
ssh = SSHClient(host="server.example.com", user="deploy", key_path="~/.ssh/id_ed25519")
tools = ToolRegistry(ssh, config_path="tools.yaml").load()
graph = InvestigationGraph(
llm=your_langchain_chat_model,
tools=tools,
system_prompt="You are an SRE copilot. Investigate safely and report evidence.",
)
async for event in graph.stream("The API is slow. What should I check?"):
print(event)
Streaming events
InvestigationGraph.stream() yields dictionaries with these event names:
| Event | Meaning |
|---|---|
token |
streamed model text |
tool_start |
tool call started with input and step id |
tool_end |
tool call finished with redacted output |
error |
graph or stream error |
done |
investigation complete |
Optional FastAPI server
The ops_copilot.server.create_app() helper exposes:
POST /investigatePOST /investigate/stream
If OPS_COPILOT_API_KEY is set, clients must send X-API-Key.
Security notes
This project executes commands on servers you control. Treat tools.yaml as privileged code.
Recommendations:
- Use SSH key auth with least-privilege users.
- Review every command template before exposing it to an LLM.
- Avoid destructive commands in YAML.
- Keep parameterized commands narrow.
- Store no secrets in YAML or prompts.
- Rely on built-in redaction as a safety net, not as your only control.
Built-in redaction covers env-style secret lines, Bearer tokens, OpenAI-style keys, JWTs, long hex runs, and inline image data URLs.
Shell tools also apply a conservative command policy. Obvious destructive commands such as rm, dd, mkfs, shutdown, docker rm, docker prune, and systemctl restart are blocked unless the YAML tool explicitly opts in with policy.allow_destructive: true. Use dry_run: true to review rendered commands without executing them.
Audit logs
Use JsonlAuditLog to append redacted investigation events for incident review:
from ops_copilot import InvestigationGraph, JsonlAuditLog
graph = InvestigationGraph(
llm=your_langchain_chat_model,
tools=tools,
system_prompt="Investigate safely and cite evidence.",
audit_log=JsonlAuditLog("audit/investigation.jsonl"),
)
Documentation and examples
docs/security-model.mddocuments threat boundaries and deployment controls.docs/why-ops-copilot.mdexplains the project scope and ecosystem need.docs/demo.mdshows a local demo that runs without real SSH credentials.docs/writing-tools.mdexplains YAML and custom Python tools.docs/server.mdcovers the optional FastAPI/SSE integration.docs/maintenance-workflows.mddescribes maintainer workflows and review checklists.docs/toolpacks.mddocuments reviewed example toolpacks.docs/incident-fixtures.mddocuments fake incidents for demos and regression tests.examples/local_demo.pyruns without a real SSH host using fake outputs.examples/custom_tool.pyshows how to inject a customRemoteToolclass.
Roadmap
- Persistent investigation sessions.
- More incident fixture coverage for regression tests.
Development
uv sync --dev
uv run ruff check .
uv run pytest
uv run python scripts/smoke.py
uv build
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ops_copilot-0.1.4.tar.gz.
File metadata
- Download URL: ops_copilot-0.1.4.tar.gz
- Upload date:
- Size: 210.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
82a72e7fa690a579c40eb8ce8ca4081add007da40e93848f071dec847afb769b
|
|
| MD5 |
034f980e382bd66c7fc2fd89090544db
|
|
| BLAKE2b-256 |
43f266484243d1f4495dc122fc5c78a90c177823843bdddb4ea122fa009f7ea7
|
Provenance
The following attestation bundles were made for ops_copilot-0.1.4.tar.gz:
Publisher:
publish.yml on BenjaminJornet/ops-copilot
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ops_copilot-0.1.4.tar.gz -
Subject digest:
82a72e7fa690a579c40eb8ce8ca4081add007da40e93848f071dec847afb769b - Sigstore transparency entry: 1698333278
- Sigstore integration time:
-
Permalink:
BenjaminJornet/ops-copilot@b102e05b6cba4ec7917ec69e55f3c1e3794cf082 -
Branch / Tag:
refs/tags/v0.1.4 - Owner: https://github.com/BenjaminJornet
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@b102e05b6cba4ec7917ec69e55f3c1e3794cf082 -
Trigger Event:
release
-
Statement type:
File details
Details for the file ops_copilot-0.1.4-py3-none-any.whl.
File metadata
- Download URL: ops_copilot-0.1.4-py3-none-any.whl
- Upload date:
- Size: 18.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e501f8b3edf38b7630ed92c1d16edaa674f211442689ed9a739d3d20a3f2f3ab
|
|
| MD5 |
5564bf68ea772a9e0e6068c5acbed321
|
|
| BLAKE2b-256 |
888b799786b312e82212a7dbcd0db7c3256b9aa2529d6f6effbba7e84e8dedaa
|
Provenance
The following attestation bundles were made for ops_copilot-0.1.4-py3-none-any.whl:
Publisher:
publish.yml on BenjaminJornet/ops-copilot
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ops_copilot-0.1.4-py3-none-any.whl -
Subject digest:
e501f8b3edf38b7630ed92c1d16edaa674f211442689ed9a739d3d20a3f2f3ab - Sigstore transparency entry: 1698333413
- Sigstore integration time:
-
Permalink:
BenjaminJornet/ops-copilot@b102e05b6cba4ec7917ec69e55f3c1e3794cf082 -
Branch / Tag:
refs/tags/v0.1.4 - Owner: https://github.com/BenjaminJornet
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@b102e05b6cba4ec7917ec69e55f3c1e3794cf082 -
Trigger Event:
release
-
Statement type: