A Site Reliability Engineer AI agent that can monitor application and infrastructure logs, diagnose issues, and report on diagnostics.

Project description

🚀 Site Reliability Engineer (SRE) Agent 🕵️‍♀️

Welcome to the SRE Agent project. This open-source AI agent helps you monitor logs, diagnose production issues, suggest fixes, and post findings to your team so you can move faster when things go wrong.

flow

🏃 Quick Start

Prerequisites

Python 3.13+
Docker (required for local mode)

1️⃣ Install the SRE Agent

pip install sre-agent

2️⃣ Start the CLI

sre-agent

On first run, the setup wizard will guide you through configuration:

cli-setup

3️⃣ Provide the required setup values

The wizard currently asks for:

ANTHROPIC_API_KEY
GITHUB_PERSONAL_ACCESS_TOKEN
GITHUB_OWNER, GITHUB_REPO, GITHUB_REF
SLACK_BOT_TOKEN, SLACK_CHANNEL_ID
AWS credentials (AWS_PROFILE or access keys) and AWS_REGION

By default the agent uses claude-sonnet-4-5-20250929. You can override this by setting the MODEL environment variable.

4️⃣ Pick a running mode

After setup, the CLI gives you two modes:

Local: run diagnoses from your machine against a CloudWatch log group.
Remote Deployment: deploy and run the agent on AWS ECS.

Remote mode currently supports AWS ECS only for deploying the agent runtime.

This is the local shell view:

cli-home

🌟 What Does It Do?

Think about a microservice app where any service can fail at any time. The agent watches error logs, identifies which service is affected, checks the configured GitHub repository, diagnoses likely root causes, suggests fixes, and reports back to Slack.

In short, it handles the heavy lifting so your team can focus on fixing the issue quickly.

Your application can run on Kubernetes, ECS, VMs, or elsewhere. The key requirement is that logs are available in CloudWatch.

🗺️ Integration Roadmap

🧠 Model provider

Anthropic
vLLM
OpenAI

🪵 Logging platform

AWS CloudWatch
Google Cloud Observability
Azure Monitor

🏢 Remote code repository

GitHub
GitLab
Bitbucket

🔔 Notification channel

Slack
Microsoft Teams

🕶️ Remote deployment mode:

AWS ECS

[!TIP] Looking for a feature or integration that is not listed yet? Open a Feature / Integration request 🚀

🏛️ Architecture

architecture

The diagram shows the boundary between your application environment and the agent responsibilities.

You are responsible for getting logs into your logging platform and setting up how the agent is triggered (for example, CloudWatch metric filters and alarms). Once triggered, the agent handles diagnosis and reporting.

The monitored application is not limited to AWS ECS. It can be deployed anywhere, as long as it sends relevant logs to CloudWatch.

When running with the current stack, the flow is:

Read error logs from CloudWatch.
Inspect source code via the configured GitHub MCP integration.
Produce diagnosis and fix suggestions.
Send results to Slack.

🧪 Evaluation

We built an evaluation suite to test both tool-use behaviour and diagnosis quality. You can find details here:

Run the suites with:

uv run sre-agent-run-tool-call-eval
uv run sre-agent-run-diagnosis-quality-eval

🤔 Why We Built This

We wanted to learn practical best practices for running AI agents in production: cost, safety, observability, and evaluation. We are sharing the journey in the open and publishing what we learn as we go.

We also write about this work on the Fuzzy Labs blog.

Contributions welcome. Join us and help shape the future of AI-powered SRE.

🔧 For Developers

See DEVELOPMENT.md for the full local setup guide.

Install dependencies:

uv sync --dev

Run the interactive CLI locally:

uv run sre-agent

If you want to run a direct diagnosis without the CLI:

docker compose up -d slack
uv run python -m sre_agent.run /aws/containerinsights/no-loafers-for-you/application currencyservice 10

Project details

Release history Release notifications | RSS feed

This version

0.2.1

Mar 12, 2026

0.2.0

Mar 12, 2026

0.1.0

Nov 21, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sre_agent-0.2.1.tar.gz (667.5 kB view details)

Uploaded Mar 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sre_agent-0.2.1-py3-none-any.whl (709.1 kB view details)

Uploaded Mar 12, 2026 Python 3

File details

Details for the file sre_agent-0.2.1.tar.gz.

File metadata

Download URL: sre_agent-0.2.1.tar.gz
Upload date: Mar 12, 2026
Size: 667.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for sre_agent-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`deda578f4c1c77cf2adc06167dfecd227082879b3dbe8c113155329dfcbc9caa`
MD5	`6773d4613a23dbe082c691c1147c0f44`
BLAKE2b-256	`19593af9303afce30f6ab2e809cf1433156d11633803a5bb778a063a5b443085`

See more details on using hashes here.

File details

Details for the file sre_agent-0.2.1-py3-none-any.whl.

File metadata

Download URL: sre_agent-0.2.1-py3-none-any.whl
Upload date: Mar 12, 2026
Size: 709.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for sre_agent-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`42365f47750c9070d838132a3ae95c9e9771018c7227043f0ea7dc192ae6c1ae`
MD5	`cacc1e6856fc6437f2595ea558a81e8a`
BLAKE2b-256	`3b2bcf5439855e0c2fc875c77905696ff3d78cb67058bdd7cb103485f212c52a`

See more details on using hashes here.

sre-agent 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

🚀 Site Reliability Engineer (SRE) Agent 🕵️‍♀️

🏃 Quick Start

Prerequisites

1️⃣ Install the SRE Agent

2️⃣ Start the CLI

3️⃣ Provide the required setup values

4️⃣ Pick a running mode

🌟 What Does It Do?

🗺️ Integration Roadmap

🧠 Model provider

🪵 Logging platform

🏢 Remote code repository

🔔 Notification channel

🕶️ Remote deployment mode:

🏛️ Architecture

🧪 Evaluation

🤔 Why We Built This

🔧 For Developers

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes