Skip to main content

Automatic repair for failing Python code, powered by any LLM.

Reason this release was yanked:

Incorrect author email metadata; corrected in v0.4.2 (same code).

Project description

self-heal

CI PyPI Python License: MIT

Automatic repair for failing Python code, powered by any LLM.

self-heal demo

self-heal catches failures, proposes an LLM-guided fix with memory of prior attempts, verifies it, and retries. Works with Claude, OpenAI, Gemini, and 100+ other providers. Sync and async. One decorator.

from self_heal import repair

def test_dollars(fn): assert fn("$12.99") == 12.99
def test_rupees(fn):  assert fn("₹1,299") == 1299.0
def test_euros(fn):   assert fn("€5,49") == 5.49

@repair(tests=[test_dollars, test_rupees, test_euros])
def extract_price(text: str) -> float:
    # Naive: only handles "$X.YY" with no commas
    return float(text.replace("$", ""))

extract_price("$12.99")   # triggers repair loop until ALL tests pass

Benchmark

On 19 small Python tasks with plausible bugs (price parsing, palindrome, flatten, roman numerals, camelCase-to-snake_case, Levenshtein, anagram, duration formatting, ...), each task repaired against a hand-written test suite:

Strategy Tasks passed Success rate LLM calls
Naive single-shot repair 13 / 19 68% 17
self-heal (multi-turn + memory) 19 / 19 100% 21

Gemini 2.5 Flash, max 3 attempts, v0.2 harness. Reproduce: self-heal bench --proposer gemini --model gemini-2.5-flash. Full task list in benchmarks/tasks.py. For the QuixBugs repair benchmark (40 programs, industry-standard): self-heal bench --suite quixbugs. Community-submitted results live in benchmarks/RESULTS.md.

The 6 tasks where self-heal wins (extract_price, is_palindrome, count_vowels, levenshtein, format_duration, is_anagram) all share a pattern: the first proposed fix handles one edge case but misses another. Memory of the failed attempt plus test feedback lets the second proposal cover both. The remaining 4 extra LLM calls (21 vs 17) are the price for +6 tasks repaired: ~30% more calls for +46% more wins.

Install

self-heal ships with a Protocol and several optional adapters. Install the adapter(s) you want:

pip install 'self-heal-llm[claude]'    # Anthropic Claude (default)
pip install 'self-heal-llm[openai]'    # OpenAI + OpenAI-compatible endpoints
pip install 'self-heal-llm[gemini]'    # Google Gemini
pip install 'self-heal-llm[litellm]'   # 100+ providers via LiteLLM
pip install 'self-heal-llm[all]'       # everything

PyPI distribution name is self-heal-llm (the short name self-heal was blocked by PyPI's similarity check with an unrelated package). The Python import stays from self_heal import ....

Provider support

Adapter Covers
ClaudeProposer Anthropic Claude (native SDK)
OpenAIProposer OpenAI + any OpenAI-compatible endpoint (OpenRouter, Together, Groq, Fireworks, Anyscale, Perplexity, xAI, DeepSeek, Azure, Ollama, LM Studio, vLLM, llama.cpp server, ...)
GeminiProposer Google Gemini (native SDK)
LiteLLMProposer 100+ providers via LiteLLM (Bedrock, Vertex, Cohere, Mistral, ...)

Features

Multi-turn repair with memory

Every proposal sees the history of prior failed attempts so the LLM can't repeat the same mistake. This is the single biggest quality win over naive retry.

Verifiers: verify=callable

Catch bad return values, not just exceptions:

@repair(verify=lambda v: isinstance(v, float) and v > 0)
def extract_price(text): ...

If the predicate returns False or raises, self-heal treats it as a failure and repairs.

Test-driven repair: tests=[...]

Give self-heal a test suite; it repairs until every test passes:

def test_empty(fn):  assert fn("") is None
def test_dollar(fn): assert fn("$12.99") == 12.99

@repair(tests=[test_empty, test_dollar])
def extract_price(text): ...

Async-native

The decorator auto-detects async def and awaits correctly; the LLM call runs in a thread pool so your event loop stays free.

@repair()
async def fetch_and_parse(url: str) -> dict: ...

Prompt customization: prompt_extra="..."

Append domain-specific instructions to every repair prompt. Useful for "always handle None inputs" or "use only the standard library."

Bring your own LLM

Implement the LLMProposer Protocol (def propose(self, system: str, user: str) -> str) and pass it in.

Repair cache: skip the LLM when you've seen it before

from self_heal import repair

@repair(cache_path=".self_heal_cache.db")
def my_fn(...): ...

First repair hits the LLM. Subsequent identical failures are served from SQLite (zero latency, zero cost). Keyed on source hash + failure signature with whitespace and memory-address normalization.

Safety: AST rails + subprocess sandbox

Two independent layers. Combine them freely.

from self_heal import repair, SafetyConfig

# AST rails only (zero overhead)
@repair(safety="moderate")   # "moderate" | "strict" | SafetyConfig(...)
def my_fn(...): ...

# AST rails + process isolation (each call runs in `python -I`)
@repair(safety=SafetyConfig(level="moderate", sandbox="subprocess"))
def my_fn(...): ...

moderate rejects proposals that call eval / exec / os.system, import subprocess / socket / pickle / ctypes, or touch __globals__ / __class__ / other escape hatches. strict additionally forbids any non-whitelisted import. The subprocess sandbox adds a real process boundary: args and return values are pickled over stdin/stdout, and the child inherits none of the caller's globals (proposals must be self-contained). See Safety for the full trust model.

Progress callbacks

from self_heal import repair, RepairEvent

def watch(event: RepairEvent):
    print(event.type, event.attempt_number)

@repair(on_event=watch)
def my_fn(...): ...

Hooks fire on attempt start, failure, propose start/complete, install, cache hit/miss, safety violation, verify, and repair completion. Perfect for agent UIs and observability pipelines.

Token streaming

When a callback is registered, self-heal streams LLM tokens through propose_chunk events as they arrive:

from self_heal import RepairEvent, repair

def on_event(event: RepairEvent):
    if event.type == "propose_chunk":
        print(event.delta, end="", flush=True)

@repair(on_event=on_event)
def my_fn(...): ...

All four built-in proposers stream natively via their SDKs. Custom proposers can implement propose_stream(system, user) -> Iterator[str] (and apropose_stream for async) to participate; those without streaming fall back to a single completion. See examples/streaming_progress.py.

Native async proposers

arun prefers each SDK's native async client when the proposer provides apropose, falling back to asyncio.to_thread(propose) otherwise. All four built-in adapters ship with native async; custom proposers work either way.

pytest plugin: pytest --heal

Mark any test with @pytest.mark.heal(target="mymod.my_fn"). When it fails with --heal, self-heal loads the target, repairs it using the test as verification, and prints the proposed diff at the end of the session.

import pytest
from mymod import extract_price

@pytest.mark.heal(target="mymod.extract_price")
def test_rupees():
    assert extract_price("₹1,299") == 1299.0
pytest --heal              # print proposed fix, leave files untouched
pytest --heal-apply        # write the fix back to disk (creates a .py.heal-backup)
pytest --heal-apply-force  # also allow modification of git-dirty files

--heal-apply uses libcst for AST-faithful replacement when installed, falling back to textual replacement. It refuses to modify files with uncommitted git changes unless --heal-apply-force is given.

CLI: heal a function from the command line

self-heal heal mymod.py::extract_price \
    --test tests/test_mymod.py::test_rupees \
    --apply

Loads the function, runs self-heal with your pytest-style test as verification, prints a unified diff, and (with --apply) writes the fix back to the file.

Why this exists

AI coding agents fail on a lot of real tasks. The industry's current answer is "retry and hope." That's not a strategy.

self-heal treats repair as a first-class primitive: diagnose the failure, propose a targeted fix with memory of prior attempts, verify, retry. A thin library you can wrap around any Python function or agent tool.

How it works

  1. Catch the exception (or verifier/test failure) and capture inputs, traceback, failure type.
  2. Classify the failure (exception, verifier, test, assertion, validation).
  3. Propose a repaired function via an LLM with a failure-aware prompt that includes the full history of prior failed proposals.
  4. Recompile the proposed function into the running process.
  5. Verify with user-provided verifier + tests.
  6. Retry with the same inputs until success or max_attempts exhausted.

API

from self_heal import repair

@repair(
    max_attempts=3,
    model="claude-sonnet-4-6",
    proposer=None,               # or ClaudeProposer / OpenAIProposer / ...
    verbose=False,
    on_failure="raise",          # or "return_none"
    verify=None,                 # Callable[[Any], bool]; raise or False triggers repair
    tests=None,                  # list[Callable[[Callable], Any]]
    prompt_extra=None,           # str; extra user instructions in every prompt
)
def my_fn(...): ...

my_fn.last_repair   # RepairResult with full attempt history
my_fn.repair_loop   # the underlying RepairLoop

For advanced use:

from self_heal import RepairLoop

loop = RepairLoop(max_attempts=5, verbose=True)
result = loop.run(my_fn, args=(...), verify=..., tests=[...])

# Async:
result = await loop.arun(my_async_fn, args=(...))

Using different providers

Claude (default):

@repair()
def my_fn(...): ...

OpenAI:

from self_heal.llm import OpenAIProposer

@repair(proposer=OpenAIProposer(model="gpt-5"))
def my_fn(...): ...

Gemini:

from self_heal.llm import GeminiProposer

@repair(proposer=GeminiProposer(model="gemini-2.5-pro"))
def my_fn(...): ...

Any OpenAI-compatible endpoint (OpenRouter, Groq, Ollama, ...):

from self_heal.llm import OpenAIProposer

# OpenRouter: hundreds of models through one key
OpenAIProposer(
    model="google/gemini-2.5-pro",
    base_url="https://openrouter.ai/api/v1",
)

# Local Ollama
OpenAIProposer(
    model="llama3.3",
    base_url="http://localhost:11434/v1",
    api_key="ollama",
)

LiteLLM catch-all (100+ providers):

from self_heal.llm import LiteLLMProposer

LiteLLMProposer(model="bedrock/anthropic.claude-3-5-sonnet")
LiteLLMProposer(model="vertex_ai/gemini-2.5-pro")
LiteLLMProposer(model="cohere/command-r-plus")

Agent framework integration

self-heal composes with any Python agent framework. For Claude Agent SDK there's a first-class integration (one decorator instead of two); for everything else, wrap the tool's underlying callable with @repair and register the result as usual.

Claude Agent SDK (first-class)

from self_heal.integrations.claude_agent_sdk import healing_tool

@healing_tool(
    "price_from_text",
    "Extract a price from messy text.",
    {"text": str},
    verify=lambda r: isinstance(r, dict) and not r.get("is_error"),
)
async def price_from_text(args):
    text = args["text"]
    return {"content": [{"type": "text", "text": str(float(text.replace("$", "")))}]}

healing_tool takes both the Claude Agent SDK's @tool parameters and all of @repair's parameters. The result is an SdkMcpTool ready to register with create_sdk_mcp_server(...). Requires pip install 'self-heal-llm[claude]' claude-agent-sdk.

Other frameworks (decorator stacking)

Examples in examples/:

Safety

self-heal executes LLM-generated code via exec() in the same process by default. Three layers of defense are available:

  1. AST rails (SafetyConfig(level="moderate"|"strict")) block dangerous imports, eval/exec, introspection escape hatches, and os.system-style calls before any code runs.
  2. Subprocess sandbox (SafetyConfig(sandbox="subprocess")) runs each call to the repaired function in a fresh python -I child process. Args/return value go over stdin/stdout via pickle. The child inherits none of the caller's globals, so proposals must be self-contained.
  3. Same trust boundary as any LLM-in-the-loop system: still do not run against untrusted inputs without network isolation.
from self_heal import repair, SafetyConfig

@repair(safety=SafetyConfig(level="moderate", sandbox="subprocess"))
def parse_price(text: str) -> float:
    ...

Roadmap

  • v0.0.1: core repair loop + decorator + Claude backend
  • v0.0.2: OpenAI, Gemini, LiteLLM adapters; works with any LLM
  • v0.1.0: multi-turn memory, verifiers, test-driven repair, async, benchmark harness
  • v0.2.0: repair cache, AST safety rails, event callbacks, pytest plugin, CLI, extended benchmarks
  • v0.3.0: subprocess sandbox, pytest --heal-apply, QuixBugs benchmark, local-model sweep tooling
  • v0.4.0: streaming token events (propose_chunk), native async proposers (apropose) for all four adapters
  • v0.5: wasm sandbox
  • v1.0: stable API + extended benchmark suite (HumanEval-Fix, Refactory)

Contributing

See CONTRIBUTING.md for the full guide: dev setup, everyday commands, how to add a new LLM proposer or benchmark task, and the PR checklist. Good first issues are tagged here.

Development (quick start)

git clone https://github.com/Johin2/self-heal.git
cd self-heal
python -m venv .venv
.venv/Scripts/pip install -e ".[dev]"   # Windows
# .venv/bin/pip install -e ".[dev]"     # macOS/Linux
pytest
ruff check .

Run the benchmark locally:

python benchmarks/run.py --proposer claude                       # uses ANTHROPIC_API_KEY
python benchmarks/run.py --proposer openai                       # uses OPENAI_API_KEY
python benchmarks/run.py --proposer gemini                       # uses GEMINI_API_KEY
python benchmarks/run.py --suite quixbugs --proposer gemini      # QuixBugs (40 programs, clones on first use)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

self_heal_llm-0.4.1.tar.gz (435.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

self_heal_llm-0.4.1-py3-none-any.whl (42.6 kB view details)

Uploaded Python 3

File details

Details for the file self_heal_llm-0.4.1.tar.gz.

File metadata

  • Download URL: self_heal_llm-0.4.1.tar.gz
  • Upload date:
  • Size: 435.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for self_heal_llm-0.4.1.tar.gz
Algorithm Hash digest
SHA256 ab30ef2401ac31311e66cb382fcdbc942d0ede2673a9e852b394ebd58662250a
MD5 c0514e78b900d7e081f6d47e990b955e
BLAKE2b-256 48745d2d54947fe88f7b143b7fe229d001b3e46992155f1e7f7ea145699791e2

See more details on using hashes here.

Provenance

The following attestation bundles were made for self_heal_llm-0.4.1.tar.gz:

Publisher: publish.yml on Johin2/self-heal

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file self_heal_llm-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: self_heal_llm-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 42.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for self_heal_llm-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3b6048dba5038efe948c8ae72334779631c5c428f5d2164628fd3f454411d0ce
MD5 fb1185e6be6fc230f9bb6b86b71b6e22
BLAKE2b-256 74eccb7ffb98aad1a372fb07eaa9283bad2243b397a106c8a4101ef04419204b

See more details on using hashes here.

Provenance

The following attestation bundles were made for self_heal_llm-0.4.1-py3-none-any.whl:

Publisher: publish.yml on Johin2/self-heal

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page