AI-powered self-healing CI/CD framework that automatically detects, diagnoses, and repairs failing workflows
Project description
Self-Healing CI/CD
A multi-agent Python framework that detects GitHub Actions failures, diagnoses them with an LLM, generates patches, validates fixes in Docker, and optionally opens a pull request.
Quick start
# Clone and install
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Configure
cp .env.example .env
# Edit .env — set GITHUB_* and OPENAI_API_KEY
# Safe trial (no file writes, no Docker)
DRY_RUN=true python main.py
# Full repair (requires Docker)
python main.py
# Pre-flight check (recommended before live runs)
python main.py check
# Run unit tests
pytest tests/
Production deployment
Full-product flow for teams using GitHub Actions end-to-end.
1. One-time setup
cp .env.example .env
# Set GITHUB_* , OPENAI_API_KEY
# Verify environment
python main.py check
Add repository secrets on GitHub (Settings → Secrets → Actions):
| Secret | Required |
|---|---|
OPENAI_API_KEY |
Yes |
GITHUB_PR_TOKEN |
No — use a PAT with repo scope if PR creation returns 403 |
Also enable: Settings → Actions → General → Workflow permissions → check Allow GitHub Actions to create and approve pull requests (required for auto-PR with GITHUB_TOKEN).
2. Local operator (human approves each patch)
REQUIRE_APPROVAL=true
AUTO_APPROVE_PATCHES=false
GIT_ENABLED=false
python main.py
You will see a unified diff and [y/N] prompt before any file is modified.
3. Automated CI self-heal (opens PR)
Already configured in .github/workflows/self-heal.yml:
| Setting | CI value | Purpose |
|---|---|---|
AUTO_APPROVE_PATCHES |
true |
No stdin in Actions |
GIT_ENABLED |
true |
Branch + PR |
EXCLUDED_WORKFLOW_NAMES |
Self-heal workflows | Avoid repair loops |
Push to main → Test Pipeline fails → Self-Heal on Failure runs → review PR → merge.
Note: Test Pipeline runs pytest tests/ sample_projects/. If CI is green, self-heal will not auto-start (nothing to fix). Use Actions → Self-Heal on Failure → Run workflow to test manually, or push a failing sample test.
4. Offline repair (cached logs, no GitHub API)
# After a prior run downloaded logs to logs/extracted/{run_id}/
OFFLINE_MODE=true python main.py
5. Path policy
Only files under ALLOWED_PATH_PREFIXES can be patched. Default:
ALLOWED_PATH_PREFIXES=sample_projects/,app/,src/,lib/,tests/
Example real app code lives under app/ (app/calculator.py, app/tests/).
6. Manual dry-run on GitHub (no CI failure needed)
Actions → Self-Heal on Failure → Run workflow
| Input | Recommended for test |
|---|---|
dry_run |
true (default) |
offline_mode |
false |
git_enabled |
false |
Uses OpenAI + GitHub API but does not write files or run Docker.
7. Web UI patch approval (local)
WEB_APPROVAL_ENABLED=true
REQUIRE_APPROVAL=true
AUTO_APPROVE_PATCHES=false
python main.py
# Browser opens http://127.0.0.1:8765 — Approve or Reject
# Or run UI only:
python main.py approve-ui
8. Multi-language log parsers
Auto-detects Python, Java (Maven/Gradle), and Go from CI logs. Force one:
LOG_PARSER_LANGUAGE=java # python | java | go
CLI commands
| Command | Description |
|---|---|
python main.py |
Run full orchestrator |
python main.py check |
Pre-flight health check |
python -m config.check |
Same as check |
Architecture
flowchart TB
subgraph entry [Entry]
MAIN[main.py]
CFG[config/validation]
end
subgraph orch [orchestrator]
WO[WorkflowOrchestrator]
RETRY[Retry loop]
MEM[(failure_memory.json)]
end
subgraph agents [Agents]
MON[MonitoringAgent<br/>GitHub Actions API]
ANA[AnalysisAgent<br/>parsers/]
REA[ReasoningAgent<br/>LLM diagnosis]
PAT[PatchAgent<br/>LLM patch]
VAL[ValidationAgent<br/>Docker pytest]
end
subgraph support [utils]
LOG[logs/ ZIP extract]
BAK[file backup]
GIT[git branch + PR]
RES[results/ metrics]
end
MAIN --> CFG --> WO
WO --> MON
MON -->|failed runs + logs| LOG
LOG --> ANA
ANA --> REA
REA --> PAT
PAT -->|apply patch| BAK
PAT --> VAL
VAL -->|pass/fail| RETRY
RETRY --> REA
RETRY --> MEM
RETRY --> RES
VAL -->|success + GIT_ENABLED| GIT
Control flow (one failure):
- Detect — list failed workflow runs; download log ZIP
- Analyze — extract errors and target file from logs
- Diagnose — LLM explains root cause (prompt template)
- Patch — LLM rewrites target file using diagnosis
- Validate — Docker build + scoped
pytest - Retry — enrich context and repeat up to
MAX_RETRY_ATTEMPTS - Publish — optional git branch, commit, pull request
| Package | Role |
|---|---|
orchestrator/ |
Agent coordination, retries, batch results |
agents/ |
Monitoring, analysis, reasoning, patch, validation |
config/ |
Settings, prompt templates (config/prompts/), startup checks |
parsers/ |
Pluggable log parsers (Python, Java, Go) |
utils/ |
Logging, backups, git, secrets masking, LLM retries |
tests/ |
Framework unit tests (pytest tests/ — 45 tests) |
results/ |
Runtime JSON metrics and repair history (gitignored) |
logs/ |
Downloaded workflow ZIPs and extracted logs (gitignored) |
See UPDATES.md for the full changelog.
Project layout
self-healing-cicd/
├── main.py # CLI entry (run, check, approve-ui)
├── agents/ # Five agents (monitoring → validation)
├── orchestrator/ # WorkflowOrchestrator + retry loop
├── config/
│ ├── settings.py
│ ├── validation.py
│ └── prompts/ # diagnosis.txt, patch.txt (not root prompts/)
├── parsers/ # python_parser, java_parser, go_parser
├── utils/ # git, approval, offline logs, Docker, etc.
├── tests/ # Unit tests (45)
├── app/ # Example application under repair
├── sample_projects/ # Intentionally failing demo targets
├── .github/workflows/ # test.yml, self-heal.yml (not root workflows/)
├── logs/ # Runtime — created on first log fetch
├── results/ # Runtime — JSON + backups (results/.gitkeep only in git)
├── scripts/ # go-live.sh, trigger-ci-failure.sh
└── Dockerfile # Validation image for ValidationAgent
Runtime directories (logs/, results/) start empty except results/.gitkeep. The framework creates JSON, backups, and extracted logs during runs. Those artifacts are gitignored.
Not used: Empty root folders named prompts/, workflows/, or sandbox/ are leftovers from an early scaffold. Prompts live under config/prompts/; CI workflows live under .github/workflows/. Safe to delete locally.
Adoption (today vs planned)
| Model | Status | What adopters do |
|---|---|---|
| Reference repo (today) | Current | Clone this repo (or copy framework tree), configure .env, add secrets, run locally or via included workflows |
| pip package | Planned | pip install self-healing-cicd + self-heal run without vendoring source |
| GitHub Action | Planned | uses: org/self-healing-cicd@v1 + OPENAI_API_KEY only |
For a thesis or demo, the reference-repo model is enough. For product adoption, the target is install-or-Action, not copying agents/ and orchestrator/ into every consumer repo.
How people use this framework
The framework supports three usage modes. Pick one based on how much automation you want.
Mode 1 — Research / thesis (local, safe)
Who: Students, evaluators, or developers exploring the pipeline.
How:
- Configure
.envwith GitHub + OpenAI credentials. - Run
DRY_RUN=true python main.pyto see diagnosis and generated patches without changing files or running Docker. - Inspect
results/and console logs for metrics and failure memory. - Run
pytest tests/to verify framework behavior without external services.
Outcome: Demonstrates multi-agent coordination and persistence; no risk to the repository.
Mode 2 — Semi-automatic repair (local operator)
Who: A developer reacting to a failed CI run on their machine.
How:
- Ensure Docker is running.
- Set
DRY_RUN=false,GIT_ENABLED=false(ortruefor PR flow). - Run
python main.pyafter a GitHub Actions failure. - Review patched files locally; run
pytestmanually if desired. - Commit or discard changes yourself.
Outcome: Faster than manual debugging; human stays in the loop for merge decisions.
Mode 3 — CI-attached self-healing (hands-off)
Who: A team that wants the repo to react when Test Pipeline fails.
How:
- Add repository secret
OPENAI_API_KEY. - Keep .github/workflows/self-heal.yml enabled (triggers on failed Test Pipeline).
- Set
GIT_ENABLED=truein the workflow (already configured there). - On failure: Actions runs
python main.py→ validate → push branch → open PR. - A human reviews and merges the PR.
Outcome: Closest to “production”; still requires human PR review before main changes.
Completing the project beyond a thesis demo
| Step | Action |
|---|---|
| 1 | Document one real failed run in your write-up (before/after logs, results/run_*.json) |
| 2 | Run Mode 1 locally and capture screenshots or metrics |
| 3 | Run Mode 3 once on GitHub with OPENAI_API_KEY secret and a deliberate test failure |
| 4 | State limitations honestly (see below) — reviewers expect this |
Ten demos live under sample_projects/ (assertion, import, syntax, logic, module, attribute, name, index, type, zero-division). By default they pass; break one with ./scripts/break-sample.sh N before pushing to test self-heal. See sample_projects/README.md.
Environment variables
Copy .env.example. Key settings:
| Variable | Required | Description |
|---|---|---|
GITHUB_TOKEN |
Live mode | Repo access + Actions logs |
GITHUB_OWNER |
Live mode | Repository owner |
GITHUB_REPO |
Live mode | Repository name |
OPENAI_API_KEY |
Always | LLM diagnosis and patching |
DRY_RUN |
No | true = no writes, no Docker |
GIT_ENABLED |
No | true = branch, commit, push, PR |
REQUIRE_APPROVAL |
No | true = prompt before apply (local) |
AUTO_APPROVE_PATCHES |
No | true = skip prompt (CI default) |
OFFLINE_MODE |
No | true = use logs/extracted/ only |
ALLOWED_PATH_PREFIXES |
No | Comma-separated path allowlist |
Git integration
When GIT_ENABLED=true and a repair validates successfully:
- Creates branch
self-heal/run-{id}-{timestamp} - Commits repaired files
- Pushes to GitHub
- Opens a PR (if
GIT_CREATE_PR=true)
Requires a git repository with GITHUB_TOKEN push permission.
DCO (Developer Certificate of Origin): If your repo enforces DCO on PRs, keep GIT_SIGN_OFF=true (default). Self-heal commits include Signed-off-by: … in the message. For an existing PR that failed DCO, use Set DCO to pass on GitHub or close it and let the next self-heal run open a new PR after you merge this fix.
CI integration
- Unit tests: .github/workflows/test.yml runs
pytest tests/ - Self-heal on failure: .github/workflows/self-heal.yml runs the orchestrator when Test Pipeline fails
Outputs
| Path | Content |
|---|---|
results/failure_memory.json |
Repair history |
results/run_*.json |
Per-run outcomes |
results/metrics_summary.json |
Aggregate metrics |
logs/ |
Downloaded workflow logs |
Limitations
This section summarizes what the framework does not guarantee. Useful for thesis evaluation and production planning.
Scope and correctness
- Python-centric validation — Log parsers cover Python, Java, and Go, but Docker validation still runs
pytest. JVM/Go repos may need custom validation beyond this framework. - LLM unpredictability — Patches can be wrong, incomplete, or stylistically odd even when validation passes (tests may not cover the real failure).
- Single-repo, single-provider — GitHub Actions only; no GitLab, Jenkins, or CircleCI.
- No semantic code understanding — Repairs are text-based (LLM + file replace), not AST-aware refactors.
Operations
- Docker required for live validation — Not optional in non-dry-run mode.
- API costs — Every diagnosis and patch calls OpenAI; retries multiply usage.
- No guaranteed PR merge — Opens a PR; humans must review. No auto-merge.
- Git state assumptions — Git integration expects a clean enough repo; complex multi-branch workflows may need manual conflict resolution.
Security and safety
- Broad file write — A bad patch overwrites the target file; backup/rollback mitigates but does not eliminate risk.
- Token scope —
GITHUB_TOKENneeds Actions read and (for git mode) contents write. Leaked tokens expose the repo. - Secrets in logs — Masking reduces risk; DEBUG logging can still expose sensitive context if enabled carelessly.
CI behavior
- Self-heal trigger — Only reacts to failures of the workflow named Test Pipeline; rename requires updating
self-heal.yml. - No infinite-loop protection beyond skipping PR events — Repeated failures could open multiple PRs if not configured (
STOP_ON_FIRST_SUCCESS, run limits). - First failures only by default —
MAX_FAILED_RUNSandMAX_FAILURES_PER_RUNcap work; very noisy pipelines may need tuning.
Implemented product safeguards
- Human approval before apply (
REQUIRE_APPROVAL/ diff prompt) - Path allowlist (
ALLOWED_PATH_PREFIXES) - Self-heal workflow excluded from triggers (loop guard)
- GitHub API retry on rate limits
- Pre-flight check (
python main.py check)
Remaining gaps for enterprise adoption
- Distribution — No published pip package or marketplace GitHub Action yet; adopters vendor this repo today (see Adoption)
- Validation stack — Docker +
pytestonly; Java/Go parsers help find targets but validation is still Python-centric - Staging / E2E — No automated integration suite against live GitHub + Docker in CI
- Auto-merge — PRs are opened for human review; no optional auto-merge policy
- Multi-CI — GitHub Actions only (no GitLab, Jenkins, CircleCI)
Already implemented (not gaps)
- Pluggable log parsers:
parsers/(Python, Java, Go) —LOG_PARSER_LANGUAGEto force - Web approval UI:
WEB_APPROVAL_ENABLED,python main.py approve-ui - Terminal approval, path allowlist, offline mode, git branch + PR, pre-flight
check
License
See repository license file if present.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file self_healing_cicd-0.1.8.tar.gz.
File metadata
- Download URL: self_healing_cicd-0.1.8.tar.gz
- Upload date:
- Size: 52.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e22d87715945c4412140a3d1d8fea39e129e3f1050b354997872a9035079e9bf
|
|
| MD5 |
8aee74493150be20063fcdb2c1fe7f5d
|
|
| BLAKE2b-256 |
4380232a3ec9db3cd7a71765885b86f6ba393498e75cf1f5851d39ca9e5740ac
|
Provenance
The following attestation bundles were made for self_healing_cicd-0.1.8.tar.gz:
Publisher:
publish.yml on NyuydineBill/self-healing-cicd
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
self_healing_cicd-0.1.8.tar.gz -
Subject digest:
e22d87715945c4412140a3d1d8fea39e129e3f1050b354997872a9035079e9bf - Sigstore transparency entry: 1739491758
- Sigstore integration time:
-
Permalink:
NyuydineBill/self-healing-cicd@25084710a1cba7a159d461a072d52485dbe66a8c -
Branch / Tag:
refs/tags/v0.1.8 - Owner: https://github.com/NyuydineBill
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@25084710a1cba7a159d461a072d52485dbe66a8c -
Trigger Event:
push
-
Statement type:
File details
Details for the file self_healing_cicd-0.1.8-py3-none-any.whl.
File metadata
- Download URL: self_healing_cicd-0.1.8-py3-none-any.whl
- Upload date:
- Size: 53.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
481c68441854496033278bbb78c005134c03c3c620cb647895305327b87f7253
|
|
| MD5 |
c91babebb592146b26ccf37a0721e3ae
|
|
| BLAKE2b-256 |
34e1b8d82e14977aec9f97c00105528caa72fe7a4006344a543b279bbd6fd124
|
Provenance
The following attestation bundles were made for self_healing_cicd-0.1.8-py3-none-any.whl:
Publisher:
publish.yml on NyuydineBill/self-healing-cicd
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
self_healing_cicd-0.1.8-py3-none-any.whl -
Subject digest:
481c68441854496033278bbb78c005134c03c3c620cb647895305327b87f7253 - Sigstore transparency entry: 1739491766
- Sigstore integration time:
-
Permalink:
NyuydineBill/self-healing-cicd@25084710a1cba7a159d461a072d52485dbe66a8c -
Branch / Tag:
refs/tags/v0.1.8 - Owner: https://github.com/NyuydineBill
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@25084710a1cba7a159d461a072d52485dbe66a8c -
Trigger Event:
push
-
Statement type: