AI-Powered Infrastructure Copilot: The Self-Healing SRE.
Project description
ResponseIQ
Your 3am alert fixed itself and opened a PR before you woke up.
ResponseIQ is a self-healing infrastructure copilot. It reads your crash logs, loads the actual source code from your repositories using Tree-sitter AST parsing, and generates surgical fixes — complete with a rollback script, a Trust Gate audit trail, and a GitHub PR — all without requiring human intervention at 3am.
Zero config. Try it now:
pip install responseiq && responseiq demo
How it works
# Pipe Kubernetes logs straight in — no plugin, no config
kubectl logs payment-svc --since=1h | responseiq --mode fix --target - --explain
# Or Docker
docker logs --tail 200 my-api | responseiq --mode fix --target - --explain
ResponseIQ will:
- Filter out noise lines automatically
- Resolve container paths like
/app/services/auth.pyto the real source files in your repos - Load the actual crashing functions via Tree-sitter AST into the AI context
- Generate a fix with a full explanation of why it chose that approach
- Run it through 7 safety guardrails before doing anything
- Open a GitHub PR (or print the patch in dry-run mode if no token is set)
- Write a
REASONING.mdaudit log you can paste straight into your post-mortem
What makes it different
Most log analysis tools pattern-match on error strings. ResponseIQ does something different: it reads your code.
When your auth service crashes at /app/services/auth_service.py:120, ResponseIQ resolves that container path to the actual file, loads the function with its surrounding context via Tree-sitter, and gives the AI exactly what it needs to understand the bug — not just the error message. That is why it generates a fix, not just a description.
Benchmark
Evaluated against SWE-bench Verified — the same dataset used to rank SWE-agent, Devin, and OpenHands.
| Model | Samples | Pass@1 | Latency p50 | API key needed |
|---|---|---|---|---|
| llama3.2 (local Ollama) | 20 | 20% | 29s | No |
1-in-5 incidents get a correct, Trust-Gate-approved patch in ~30 seconds using a free local model. See reports/swe_bench_eval.md for the per-repo breakdown.
Architecture
flowchart TD
A([Log Input]) --> B[Noise Filter]
B --> C[Concurrent Scan — asyncio.gather]
C --> D{AI Classifier}
D -->|HIGH / CRITICAL| E[Tree-sitter AST Source Loader]
D -->|LOW| Z([Skip])
E --> F[LLM Reasoning — Ollama / OpenAI]
F --> G{Trust Gate — 7 guardrails}
G -->|Approved| H[ProofBundle — SHA-256 sealed]
G -->|Blocked| I([Human Review])
H --> J[GitHub PR — githubkit]
J --> K[PR Bot — /responseiq approve]
Features
Core pipeline
| What | How |
|---|---|
| Reads your actual source code | Tree-sitter AST loads the exact crashing functions, not just file paths |
| Multi-repo path resolution | Maps container paths like /app/services/auth.py to your real source — local or remote |
| Trust Gate — 7 guardrails | No bare except, no secrets, syntax valid, blast radius assessed, and more — every patch must pass all 7 |
| SHA-256 proof chain | Every decision is sealed into a ProofBundle — SOC2-ready audit trail |
| GitHub PR bot | Opens a draft PR and responds to /responseiq approve, /responseiq reject, /responseiq explain |
| Rollback script | Generates an executable rollback_<id>.py alongside every patch |
| Works without any API key | Full rule-engine fallback is always available |
New in v2.24 — Modern AI stack
| Feature | What it does |
|---|---|
--mode watch daemon |
Continuous async tail of any log file; bursts are debounced and sent for AI analysis automatically |
| SSE streaming endpoint | POST /api/v1/incidents/analyze/stream emits live events (started → scrubbing → analyzing → critic → trust_gate → complete) so UIs update in real time |
| Critic / reviewer agent | A second fast-model pass reviews every proposed fix for logic errors and hidden regressions before the Trust Gate runs |
| LiteLLM multi-provider | Set RESPONSEIQ_USE_LITELLM=true to route calls through LiteLLM — supports 100+ providers with a single env-var switch |
| OTel GenAI conventions | All LLM calls emit gen_ai.* OpenTelemetry spans — plug any OTLP-compatible backend for cost and latency tracking |
| DSPy prompt optimiser | Scaffold for automatic prompt optimisation via DSPy; activate with responseiq[dspy] + RESPONSEIQ_DSPY_ENABLED=true |
| MCP server | responseiq-mcp exposes four tools (analyze_incident, get_remediation, run_trust_gate, open_pr) via the Model Context Protocol for agent integrations |
Language-specific parsers
Dedicated parsers extract rich structured context — goroutine IDs, stack frames, exception chains, and framework signals — before the AI even sees the log.
| Language / Framework | What it extracts |
|---|---|
| Go | panic message, goroutine states, stack frames with file + line, signal crashes (SIGSEGV, SIGABRT) |
| Node.js | TypeError/ReferenceError, V8 stack frames with column numbers, unhandled promise rejections |
| Spring Boot | Exception chain, root cause, Log4j/Logback lines with thread and logger |
| Django | Exception type, traceback frames, request path |
| FastAPI | Uvicorn/Starlette exception context |
| Python (built-in) | All Python tracebacks via KeywordParser |
Infrastructure features
| Feature | What it does |
|---|---|
| Post-apply watchdog | After applying a patch, monitors your error-rate metric (Datadog, Prometheus, or /health probe). Automatically executes the rollback script if the rate spikes above the configured threshold. |
| K8s YAML patcher | Edits Kubernetes Deployment manifests using ruamel.yaml — comments, quotes, and indentation survive the diff. |
| Stateful conversations | Each incident gets a persistent multi-turn AI session. Redis-backed in production; transparent in-memory fallback in dev/test. |
| PII scrubbing | Regex email redaction is always on. Set RESPONSEIQ_NER_SCRUB=true and install spaCy to add NER-level PERSON/ORG/location scrubbing. |
Quick start
CLI — local dev or CI
# 1. Install
pip install responseiq
# 2. Configure — 3 questions, writes .env, done
responseiq init
# 3. Scan
responseiq --mode scan --target ./logs/error.log
# 4. Fix with full explanation
responseiq --mode fix --target ./logs/error.log --explain
# 5. Shadow mode — read-only triage, nothing is changed
responseiq --mode shadow --target ./logs/ --shadow-report
# 6. Watch mode — continuous tail daemon (new in v2.24)
responseiq --mode watch --target ./logs/app.log
No LLM key? The rule-engine fallback activates automatically. responseiq init is optional.
LLM options:
# Local Ollama (free, no data leaves your machine — recommended)
LLM_BASE_URL=http://localhost:11434/v1 LLM_ANALYSIS_MODEL=llama3.2 responseiq --mode scan --target ./logs/
# OpenAI
OPENAI_API_KEY=sk-... responseiq --mode scan --target ./logs/
Pipe from anywhere:
kubectl logs -l app=api --since=1h | responseiq --mode fix --target - --explain
docker logs --tail 200 my-container | responseiq --mode scan --target -
cat ./logs/app.log | responseiq --mode scan --target -
# Structured JSON / NDJSON
echo '{"level":"ERROR","message":"KeyError: email","service":"api"}' \
| responseiq --mode scan --target -
Self-hosted API server
# Start server + Postgres
docker-compose up -d
curl http://localhost:8000/health
# Local dev with hot-reload
uv sync
uv run uvicorn responseiq.app:app --reload
Point your existing alert tools at POST /api/v1/incidents/ingest:
| Platform | Integration |
|---|---|
| Datadog | Webhook → POST /api/v1/incidents/ingest |
| PagerDuty | Event Orchestration webhook |
| Sentry | Internal Integrations → Webhook URL |
| Alertmanager | Webhook receiver in alertmanager.yml |
| GitHub Actions | curl step in your workflow |
SSE streaming — get live progress events while an incident is being analysed:
curl -N -X POST http://localhost:8000/api/v1/incidents/analyze/stream \
-H 'Content-Type: application/json' \
-d '{"log_text": "ERROR: NullPointerException at UserService.java:42"}'
# event: started
# event: scrubbing
# event: analyzing
# event: critic
# event: trust_gate
# event: complete
MCP server — expose ResponseIQ as an AI agent tool:
pip install 'responseiq[mcp]'
responseiq-mcp # starts stdio MCP server
# Tools: analyze_incident · get_remediation · run_trust_gate · open_pr
Try the built-in demo
git clone https://github.com/infoyouth/responseiq.git && cd responseiq
pip install responseiq
responseiq --mode scan --target ./samples/crash.log
responseiq --mode fix --target ./samples/crash.log --explain
responseiq --mode shadow --target ./samples/ --shadow-report
samples/ contains three real injected bugs with a pre-recorded crash log. No API key, no database, nothing to set up.
Expected scan output:
------------------------------------------------------------
ResponseIQ Scan Report
Target : samples/crash.log
Status : SUCCESS
------------------------------------------------------------
Scanned : 3 message(s)
Incidents: 3 found
------------------------------------------------------------
1. [HIGH] KeyError: 'email' in process_user_request
2. [CRITICAL] Memory leak — _request_log unbounded growth
3. [HIGH] ZeroDivisionError: division by zero (reset race)
------------------------------------------------------------
Tip: run with --mode fix to apply safe remediations.
------------------------------------------------------------
SWE-bench
# Smoke run — 5 samples, no LLM key
uv run python scripts/swe_bench_eval.py --samples 5 --dry-run
# Full run (500 samples)
uv run python scripts/swe_bench_eval.py --samples 500
# Filter by repo
uv run python scripts/swe_bench_eval.py --repo sympy/sympy --samples 50
Outputs reports/swe_bench_eval.md and reports/predictions.jsonl (compatible with the official swebench Docker harness).
Development
uv sync
make lint # ruff format --check + ruff check + mypy
make test # pytest -n auto --dist=loadscope
make all # lint + type-check + test + build + security audit
Project layout
src/responseiq/
cli.py # CLI entry point (--mode scan|fix|shadow|watch)
app.py # FastAPI server (webhooks + SSE streaming)
mcp_server.py # MCP server — 4 agent tools (responseiq-mcp)
ai/
llm_service.py # LLM calls with OTel GenAI spans + LiteLLM support
dspy_optimizer.py # DSPy prompt optimisation scaffold (opt-in)
services/
remediation_service.py # Core LLM reasoning brain
critic_service.py # Second-pass critic/reviewer agent
github_pr_service.py # GitHub PR bot (githubkit)
watchdog_service.py # Post-apply error-rate monitor + auto-rollback
conversation_service.py # Stateful AI conversations (Redis + in-memory fallback)
routers/
streaming.py # SSE streaming endpoint
plugins/
scan.py / fix.py / shadow.py / watch.py
go_parser.py / nodejs_parser.py / spring_parser.py / django_parser.py / fastapi_parser.py
utils/
context_extractor.py # Tree-sitter AST source loader
multi_repo_resolver.py # Maps container paths to local source files
k8s_patcher.py # Kubernetes YAML patcher (comment-preserving)
log_scrubber.py # PII redaction (email regex + optional spaCy NER)
Disclaimer
ResponseIQ uses generative AI to suggest infrastructure and code fixes. AI can hallucinate — syntactically correct but functionally wrong patches are possible. Review every PR or patch before merging. The Trust Gate reduces risk but is not a substitute for human review. See the MIT License — no warranty implied.
For security issues see docs/SECURITY.md.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file responseiq-2.25.0.tar.gz.
File metadata
- Download URL: responseiq-2.25.0.tar.gz
- Upload date:
- Size: 178.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
49518728e91fc7b7f1584566be8462225c6156a1e25f8c98738fc23a9e95abe1
|
|
| MD5 |
9516d1c1d289aa398ea36bb783b47f52
|
|
| BLAKE2b-256 |
47fff80106d80a3f1fb69ba2b23112b41b1f2eb8b755c917d7659589459cac1a
|
Provenance
The following attestation bundles were made for responseiq-2.25.0.tar.gz:
Publisher:
release.yml on infoyouth/responseiq
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
responseiq-2.25.0.tar.gz -
Subject digest:
49518728e91fc7b7f1584566be8462225c6156a1e25f8c98738fc23a9e95abe1 - Sigstore transparency entry: 1075369393
- Sigstore integration time:
-
Permalink:
infoyouth/responseiq@9c34037e1ce0ea019b76a5f36dc8c58a775733d5 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/infoyouth
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@9c34037e1ce0ea019b76a5f36dc8c58a775733d5 -
Trigger Event:
workflow_run
-
Statement type:
File details
Details for the file responseiq-2.25.0-py3-none-any.whl.
File metadata
- Download URL: responseiq-2.25.0-py3-none-any.whl
- Upload date:
- Size: 211.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
59d5f5d65d3728b1ee7d1d2391b514a6f63a33e75b250871fb829d11c111cfb7
|
|
| MD5 |
361562db0589d53896e3c54aaa3e1ed4
|
|
| BLAKE2b-256 |
a0bd2be8b72063faf3f430860966f506c497fa7b2974757e8496897810d9e59a
|
Provenance
The following attestation bundles were made for responseiq-2.25.0-py3-none-any.whl:
Publisher:
release.yml on infoyouth/responseiq
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
responseiq-2.25.0-py3-none-any.whl -
Subject digest:
59d5f5d65d3728b1ee7d1d2391b514a6f63a33e75b250871fb829d11c111cfb7 - Sigstore transparency entry: 1075369421
- Sigstore integration time:
-
Permalink:
infoyouth/responseiq@9c34037e1ce0ea019b76a5f36dc8c58a775733d5 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/infoyouth
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@9c34037e1ce0ea019b76a5f36dc8c58a775733d5 -
Trigger Event:
workflow_run
-
Statement type: