AI-powered self-healing CI/CD framework that automatically detects, diagnoses, and repairs failing workflows

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

Self-Healing CI/CD

A multi-agent Python framework that detects GitHub Actions failures, diagnoses them with an LLM, generates patches, validates fixes in Docker, and optionally opens a pull request.

Quick start

# Clone and install
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Configure
cp .env.example .env
# Edit .env — set GITHUB_* and OPENAI_API_KEY

# Safe trial (no file writes, no Docker)
DRY_RUN=true python main.py

# Full repair (requires Docker)
python main.py

# Pre-flight check (recommended before live runs)
python main.py check

# Run unit tests
pytest tests/

Production deployment

Full-product flow for teams using GitHub Actions end-to-end.

1. One-time setup

cp .env.example .env
# Set GITHUB_* , OPENAI_API_KEY

# Verify environment
python main.py check

Add repository secrets on GitHub (Settings → Secrets → Actions):

Secret	Required
`OPENAI_API_KEY`	Yes
`GITHUB_PR_TOKEN`	No — use a PAT with `repo` scope if PR creation returns 403

Also enable: Settings → Actions → General → Workflow permissions → check Allow GitHub Actions to create and approve pull requests (required for auto-PR with GITHUB_TOKEN).

2. Local operator (human approves each patch)

REQUIRE_APPROVAL=true
AUTO_APPROVE_PATCHES=false
GIT_ENABLED=false
python main.py

You will see a unified diff and [y/N] prompt before any file is modified.

3. Automated CI self-heal (opens PR)

Already configured in .github/workflows/self-heal.yml:

Setting	CI value	Purpose
`AUTO_APPROVE_PATCHES`	`true`	No stdin in Actions
`GIT_ENABLED`	`true`	Branch + PR
`EXCLUDED_WORKFLOW_NAMES`	Self-heal workflows	Avoid repair loops

Push to main → Test Pipeline fails → Self-Heal on Failure runs → review PR → merge.

Note: Test Pipeline runs pytest tests/ sample_projects/. If CI is green, self-heal will not auto-start (nothing to fix). Use Actions → Self-Heal on Failure → Run workflow to test manually, or push a failing sample test.

4. Offline repair (cached logs, no GitHub API)

# After a prior run downloaded logs to logs/extracted/{run_id}/
OFFLINE_MODE=true python main.py

5. Path policy

Only files under ALLOWED_PATH_PREFIXES can be patched. Default:

ALLOWED_PATH_PREFIXES=sample_projects/,app/,src/,lib/,tests/

Example real app code lives under app/ (app/calculator.py, app/tests/).

6. Manual dry-run on GitHub (no CI failure needed)

Actions → Self-Heal on Failure → Run workflow

Input	Recommended for test
`dry_run`	true (default)
`offline_mode`	false
`git_enabled`	false

Uses OpenAI + GitHub API but does not write files or run Docker.

7. Web UI patch approval (local)

WEB_APPROVAL_ENABLED=true
REQUIRE_APPROVAL=true
AUTO_APPROVE_PATCHES=false
python main.py
# Browser opens http://127.0.0.1:8765 — Approve or Reject

# Or run UI only:
python main.py approve-ui

8. Multi-language log parsers

Auto-detects Python, Java (Maven/Gradle), and Go from CI logs. Force one:

LOG_PARSER_LANGUAGE=java   # python | java | go

CLI commands

Command	Description
`python main.py`	Run full orchestrator
`python main.py check`	Pre-flight health check
`python -m config.check`	Same as check

Architecture

flowchart TB
    subgraph entry [Entry]
        MAIN[main.py]
        CFG[config/validation]
    end

    subgraph orch [orchestrator]
        WO[WorkflowOrchestrator]
        RETRY[Retry loop]
        MEM[(failure_memory.json)]
    end

    subgraph agents [Agents]
        MON[MonitoringAgent<br/>GitHub Actions API]
        ANA[AnalysisAgent<br/>parsers/]
        REA[ReasoningAgent<br/>LLM diagnosis]
        PAT[PatchAgent<br/>LLM patch]
        VAL[ValidationAgent<br/>Docker pytest]
    end

    subgraph support [utils]
        LOG[logs/ ZIP extract]
        BAK[file backup]
        GIT[git branch + PR]
        RES[results/ metrics]
    end

    MAIN --> CFG --> WO
    WO --> MON
    MON -->|failed runs + logs| LOG
    LOG --> ANA
    ANA --> REA
    REA --> PAT
    PAT -->|apply patch| BAK
    PAT --> VAL
    VAL -->|pass/fail| RETRY
    RETRY --> REA
    RETRY --> MEM
    RETRY --> RES
    VAL -->|success + GIT_ENABLED| GIT

Control flow (one failure):

Detect — list failed workflow runs; download log ZIP
Analyze — extract errors and target file from logs
Diagnose — LLM explains root cause (prompt template)
Patch — LLM rewrites target file using diagnosis
Validate — Docker build + scoped pytest
Retry — enrich context and repeat up to MAX_RETRY_ATTEMPTS
Publish — optional git branch, commit, pull request

Package	Role
`orchestrator/`	Agent coordination, retries, batch results
`agents/`	Monitoring, analysis, reasoning, patch, validation
`config/`	Settings, prompt templates (`config/prompts/`), startup checks
`parsers/`	Pluggable log parsers (Python, Java, Go)
`utils/`	Logging, backups, git, secrets masking, LLM retries
`tests/`	Framework unit tests (`pytest tests/` — 45 tests)
`results/`	Runtime JSON metrics and repair history (gitignored)
`logs/`	Downloaded workflow ZIPs and extracted logs (gitignored)

See UPDATES.md for the full changelog.

Project layout

self-healing-cicd/
├── main.py                 # CLI entry (run, check, approve-ui)
├── agents/                 # Five agents (monitoring → validation)
├── orchestrator/           # WorkflowOrchestrator + retry loop
├── config/
│   ├── settings.py
│   ├── validation.py
│   └── prompts/            # diagnosis.txt, patch.txt (not root prompts/)
├── parsers/                # python_parser, java_parser, go_parser
├── utils/                  # git, approval, offline logs, Docker, etc.
├── tests/                  # Unit tests (45)
├── app/                    # Example application under repair
├── sample_projects/        # Intentionally failing demo targets
├── .github/workflows/      # test.yml, self-heal.yml (not root workflows/)
├── logs/                   # Runtime — created on first log fetch
├── results/                # Runtime — JSON + backups (results/.gitkeep only in git)
├── scripts/                # go-live.sh, trigger-ci-failure.sh
└── Dockerfile              # Validation image for ValidationAgent

Runtime directories (logs/, results/) start empty except results/.gitkeep. The framework creates JSON, backups, and extracted logs during runs. Those artifacts are gitignored.

Not used: Empty root folders named prompts/, workflows/, or sandbox/ are leftovers from an early scaffold. Prompts live under config/prompts/; CI workflows live under .github/workflows/. Safe to delete locally.

Adoption (today vs planned)

Model	Status	What adopters do
Reference repo (today)	Current	Clone this repo (or copy framework tree), configure `.env`, add secrets, run locally or via included workflows
pip package	Planned	`pip install self-healing-cicd` + `self-heal run` without vendoring source
GitHub Action	Planned	`uses: org/self-healing-cicd@v1` + `OPENAI_API_KEY` only

For a thesis or demo, the reference-repo model is enough. For product adoption, the target is install-or-Action, not copying agents/ and orchestrator/ into every consumer repo.

How people use this framework

The framework supports three usage modes. Pick one based on how much automation you want.

Mode 1 — Research / thesis (local, safe)

Who: Students, evaluators, or developers exploring the pipeline.

How:

Configure .env with GitHub + OpenAI credentials.
Run DRY_RUN=true python main.py to see diagnosis and generated patches without changing files or running Docker.
Inspect results/ and console logs for metrics and failure memory.
Run pytest tests/ to verify framework behavior without external services.

Outcome: Demonstrates multi-agent coordination and persistence; no risk to the repository.

Mode 2 — Semi-automatic repair (local operator)

Who: A developer reacting to a failed CI run on their machine.

How:

Ensure Docker is running.
Set DRY_RUN=false, GIT_ENABLED=false (or true for PR flow).
Run python main.py after a GitHub Actions failure.
Review patched files locally; run pytest manually if desired.
Commit or discard changes yourself.

Outcome: Faster than manual debugging; human stays in the loop for merge decisions.

Mode 3 — CI-attached self-healing (hands-off)

Who: A team that wants the repo to react when Test Pipeline fails.

How:

Add repository secret OPENAI_API_KEY.
Keep .github/workflows/self-heal.yml enabled (triggers on failed Test Pipeline).
Set GIT_ENABLED=true in the workflow (already configured there).
On failure: Actions runs python main.py → validate → push branch → open PR.
A human reviews and merges the PR.

Outcome: Closest to “production”; still requires human PR review before main changes.

Completing the project beyond a thesis demo

Step	Action
1	Document one real failed run in your write-up (before/after logs, `results/run_*.json`)
2	Run Mode 1 locally and capture screenshots or metrics
3	Run Mode 3 once on GitHub with `OPENAI_API_KEY` secret and a deliberate test failure
4	State limitations honestly (see below) — reviewers expect this

Ten demos live under sample_projects/ (assertion, import, syntax, logic, module, attribute, name, index, type, zero-division). By default they pass; break one with ./scripts/break-sample.sh N before pushing to test self-heal. See sample_projects/README.md.

Environment variables

Copy .env.example. Key settings:

Variable	Required	Description
`GITHUB_TOKEN`	Live mode	Repo access + Actions logs
`GITHUB_OWNER`	Live mode	Repository owner
`GITHUB_REPO`	Live mode	Repository name
`OPENAI_API_KEY`	Always	LLM diagnosis and patching
`DRY_RUN`	No	`true` = no writes, no Docker
`GIT_ENABLED`	No	`true` = branch, commit, push, PR
`REQUIRE_APPROVAL`	No	`true` = prompt before apply (local)
`AUTO_APPROVE_PATCHES`	No	`true` = skip prompt (CI default)
`OFFLINE_MODE`	No	`true` = use `logs/extracted/` only
`ALLOWED_PATH_PREFIXES`	No	Comma-separated path allowlist

Git integration

When GIT_ENABLED=true and a repair validates successfully:

Creates branch self-heal/run-{id}-{timestamp}
Commits repaired files
Pushes to GitHub
Opens a PR (if GIT_CREATE_PR=true)

Requires a git repository with GITHUB_TOKEN push permission.

DCO (Developer Certificate of Origin): If your repo enforces DCO on PRs, keep GIT_SIGN_OFF=true (default). Self-heal commits include Signed-off-by: … in the message. For an existing PR that failed DCO, use Set DCO to pass on GitHub or close it and let the next self-heal run open a new PR after you merge this fix.

CI integration

Unit tests: .github/workflows/test.yml runs pytest tests/
Self-heal on failure: .github/workflows/self-heal.yml runs the orchestrator when Test Pipeline fails

Outputs

Path	Content
`results/failure_memory.json`	Repair history
`results/run_*.json`	Per-run outcomes
`results/metrics_summary.json`	Aggregate metrics
`logs/`	Downloaded workflow logs

Limitations

This section summarizes what the framework does not guarantee. Useful for thesis evaluation and production planning.

Scope and correctness

Python-centric validation — Log parsers cover Python, Java, and Go, but Docker validation still runs pytest. JVM/Go repos may need custom validation beyond this framework.
LLM unpredictability — Patches can be wrong, incomplete, or stylistically odd even when validation passes (tests may not cover the real failure).
Single-repo, single-provider — GitHub Actions only; no GitLab, Jenkins, or CircleCI.
No semantic code understanding — Repairs are text-based (LLM + file replace), not AST-aware refactors.

Operations

Docker required for live validation — Not optional in non-dry-run mode.
API costs — Every diagnosis and patch calls OpenAI; retries multiply usage.
No guaranteed PR merge — Opens a PR; humans must review. No auto-merge.
Git state assumptions — Git integration expects a clean enough repo; complex multi-branch workflows may need manual conflict resolution.

Security and safety

Broad file write — A bad patch overwrites the target file; backup/rollback mitigates but does not eliminate risk.
Token scope — GITHUB_TOKEN needs Actions read and (for git mode) contents write. Leaked tokens expose the repo.
Secrets in logs — Masking reduces risk; DEBUG logging can still expose sensitive context if enabled carelessly.

CI behavior

Self-heal trigger — Only reacts to failures of the workflow named Test Pipeline; rename requires updating self-heal.yml.
No infinite-loop protection beyond skipping PR events — Repeated failures could open multiple PRs if not configured (STOP_ON_FIRST_SUCCESS, run limits).
First failures only by default — MAX_FAILED_RUNS and MAX_FAILURES_PER_RUN cap work; very noisy pipelines may need tuning.

Implemented product safeguards

Human approval before apply (REQUIRE_APPROVAL / diff prompt)
Path allowlist (ALLOWED_PATH_PREFIXES)
Self-heal workflow excluded from triggers (loop guard)
GitHub API retry on rate limits
Pre-flight check (python main.py check)

Remaining gaps for enterprise adoption

Distribution — No published pip package or marketplace GitHub Action yet; adopters vendor this repo today (see Adoption)
Validation stack — Docker + pytest only; Java/Go parsers help find targets but validation is still Python-centric
Staging / E2E — No automated integration suite against live GitHub + Docker in CI
Auto-merge — PRs are opened for human review; no optional auto-merge policy
Multi-CI — GitHub Actions only (no GitLab, Jenkins, CircleCI)

Already implemented (not gaps)

Pluggable log parsers: parsers/ (Python, Java, Go) — LOG_PARSER_LANGUAGE to force
Web approval UI: WEB_APPROVAL_ENABLED, python main.py approve-ui
Terminal approval, path allowlist, offline mode, git branch + PR, pre-flight check

License

See repository license file if present.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

NyuydineBill

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.11

Jun 6, 2026

0.1.10

Jun 6, 2026

0.1.9

Jun 6, 2026

This version

0.1.8

Jun 6, 2026

0.1.7

Jun 6, 2026

0.1.6

Jun 6, 2026

0.1.5

Jun 6, 2026

0.1.4

Jun 6, 2026

0.1.3

Jun 6, 2026

0.1.2

Jun 6, 2026

0.1.1

Jun 6, 2026

0.1.0

Jun 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

self_healing_cicd-0.1.8.tar.gz (52.8 kB view details)

Uploaded Jun 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

self_healing_cicd-0.1.8-py3-none-any.whl (53.5 kB view details)

Uploaded Jun 6, 2026 Python 3

File details

Details for the file self_healing_cicd-0.1.8.tar.gz.

File metadata

Download URL: self_healing_cicd-0.1.8.tar.gz
Upload date: Jun 6, 2026
Size: 52.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for self_healing_cicd-0.1.8.tar.gz
Algorithm	Hash digest
SHA256	`e22d87715945c4412140a3d1d8fea39e129e3f1050b354997872a9035079e9bf`
MD5	`8aee74493150be20063fcdb2c1fe7f5d`
BLAKE2b-256	`4380232a3ec9db3cd7a71765885b86f6ba393498e75cf1f5851d39ca9e5740ac`

See more details on using hashes here.

Provenance

The following attestation bundles were made for self_healing_cicd-0.1.8.tar.gz:

Publisher: publish.yml on NyuydineBill/self-healing-cicd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: self_healing_cicd-0.1.8.tar.gz
- Subject digest: e22d87715945c4412140a3d1d8fea39e129e3f1050b354997872a9035079e9bf
- Sigstore transparency entry: 1739491758
- Sigstore integration time: Jun 6, 2026
Source repository:
- Permalink: NyuydineBill/self-healing-cicd@25084710a1cba7a159d461a072d52485dbe66a8c
- Branch / Tag: refs/tags/v0.1.8
- Owner: https://github.com/NyuydineBill
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@25084710a1cba7a159d461a072d52485dbe66a8c
- Trigger Event: push

File details

Details for the file self_healing_cicd-0.1.8-py3-none-any.whl.

File metadata

Download URL: self_healing_cicd-0.1.8-py3-none-any.whl
Upload date: Jun 6, 2026
Size: 53.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for self_healing_cicd-0.1.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`481c68441854496033278bbb78c005134c03c3c620cb647895305327b87f7253`
MD5	`c91babebb592146b26ccf37a0721e3ae`
BLAKE2b-256	`34e1b8d82e14977aec9f97c00105528caa72fe7a4006344a543b279bbd6fd124`

See more details on using hashes here.

Provenance

The following attestation bundles were made for self_healing_cicd-0.1.8-py3-none-any.whl:

Publisher: publish.yml on NyuydineBill/self-healing-cicd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: self_healing_cicd-0.1.8-py3-none-any.whl
- Subject digest: 481c68441854496033278bbb78c005134c03c3c620cb647895305327b87f7253
- Sigstore transparency entry: 1739491766
- Sigstore integration time: Jun 6, 2026
Source repository:
- Permalink: NyuydineBill/self-healing-cicd@25084710a1cba7a159d461a072d52485dbe66a8c
- Branch / Tag: refs/tags/v0.1.8
- Owner: https://github.com/NyuydineBill
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@25084710a1cba7a159d461a072d52485dbe66a8c
- Trigger Event: push

self-healing-cicd 0.1.8

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Self-Healing CI/CD

Quick start

Production deployment

1. One-time setup

2. Local operator (human approves each patch)

3. Automated CI self-heal (opens PR)

4. Offline repair (cached logs, no GitHub API)

5. Path policy

6. Manual dry-run on GitHub (no CI failure needed)

7. Web UI patch approval (local)

8. Multi-language log parsers

CLI commands

Architecture

Project layout

Adoption (today vs planned)

How people use this framework

Mode 1 — Research / thesis (local, safe)

Mode 2 — Semi-automatic repair (local operator)

Mode 3 — CI-attached self-healing (hands-off)

Completing the project beyond a thesis demo

Environment variables

Git integration

CI integration

Outputs

Limitations

Scope and correctness

Operations

Security and safety

CI behavior

Implemented product safeguards

Remaining gaps for enterprise adoption

Already implemented (not gaps)

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance