Skip to main content

Fuzzy edit tool for LLM coding agents โ€” never fail a str_replace again

Project description

๐Ÿ”ง HarnessKit

Fuzzy edit tool for LLM coding agents โ€” never fail a str_replace again.

License: MIT Python 3.8+ Zero Dependencies


The Problem

Every LLM coding agent has the same Achilles' heel: edit application.

When Claude, GPT, or any model tries to modify code, it generates an old_text โ†’ new_text pair. The tool then does an exact string match to find where to apply the change. And it fails. A lot.

  • Whitespace differences โ€” the model adds a space, drops a tab, or normalizes indentation
  • Minor hallucinations โ€” a variable name is slightly off, a comment is paraphrased
  • Format fragility โ€” diffs, patches, and line-number schemes all break in different ways

The result? Up to 50% edit failure rates on non-native models. Every failed edit wastes a tool call, burns tokens on retries, and breaks agent flow.

The Solution

HarnessKit (hk) is a drop-in edit tool that fuzzy-matches the old text before replacing it. It uses a 4-stage matching cascade:

  1. Exact match โ€” zero overhead when the model is precise
  2. Normalized whitespace โ€” catches the most common failure mode
  3. Sequence matching โ€” difflib.SequenceMatcher with configurable threshold (default 0.8)
  4. Line-by-line fuzzy โ€” finds the best contiguous block match for heavily drifted edits

Every edit returns a confidence score and match type, so your agent knows exactly how the edit was resolved.

Quick Start

pip install harnesskit

Or just copy hk.py into your project โ€” it's a single file, stdlib only.

CLI Usage

# Direct arguments
hk apply --file app.py --old "def hello():\n    print('hi')" --new "def hello():\n    print('hello world')"

# JSON from stdin (perfect for tool_use integration)
echo '{"file": "app.py", "old_text": "def hello():", "new_text": "def greet():"}' | hk apply --stdin

# From a JSON file
hk apply --edit changes.json

# XML format (natural for Claude and other LLMs)
echo '<edit file="app.py"><old>def hello():</old><new>def greet():</new></edit>' | hk apply --stdin

# XML from file
hk apply --edit changes.xml

# Dry run โ€” see what would change without writing
hk apply --file app.py --old "..." --new "..." --dry-run

JSON Edit Format

{
  "file": "path/to/file.py",
  "old_text": "def hello():\n    print('hi')",
  "new_text": "def hello():\n    print('hello world')"
}

Batch multiple edits:

{
  "edits": [
    {"file": "a.py", "old_text": "...", "new_text": "..."},
    {"file": "b.py", "old_text": "...", "new_text": "..."}
  ]
}

XML Edit Format

HarnessKit auto-detects XML input โ€” ideal for LLMs that naturally output XML:

<edit file="path/to/file.py">
  <old>def hello():
    print('hi')</old>
  <new>def hello():
    print('hello world')</new>
</edit>

Batch multiple edits:

<edits>
  <edit file="a.py"><old>...</old><new>...</new></edit>
  <edit file="b.py"><old>...</old><new>...</new></edit>
</edits>

The path attribute works as an alias for file.

Output

{
  "status": "applied",
  "file": "app.py",
  "match_type": "fuzzy",
  "confidence": 0.92,
  "matched_text": "def hello():\n    print( 'hi' )"
}

Exit Codes

Code Meaning
0 Edit applied successfully
1 No match found
2 Ambiguous โ€” multiple matches

MCP Server

HarnessKit ships an MCP (Model Context Protocol) server for plug-and-play integration with any MCP-compatible agent.

Quick Start

Add to your MCP client config (e.g. Claude Desktop, Cursor, etc.):

{
  "mcpServers": {
    "harnesskit": {
      "command": "python3",
      "args": ["/path/to/hk_mcp.py"]
    }
  }
}

Tools

Tool Description
harnesskit_apply Apply a fuzzy edit to a file (supports validate param)
harnesskit_apply_batch Apply multiple edits in one call
harnesskit_match Preview the match without modifying (dry run)
harnesskit_create Create a new file (with optional validation)
harnesskit_validate Validate a file's syntax without modifying it

Each tool returns the match type, confidence score, and matched text โ€” giving the agent full visibility into how the edit was resolved.

Example

{
  "name": "harnesskit_apply",
  "arguments": {
    "file": "app.py",
    "old_text": "def hello():\n    print('hi')",
    "new_text": "def hello():\n    print('hello world')",
    "threshold": 0.8
  }
}

Response:

{
  "status": "applied",
  "match_type": "whitespace",
  "confidence": 0.95
}

Examples

The examples/ directory contains complete, working integration examples:

Example Description
claude_agent.py Full Claude tool_use agent loop โ€” give it a task and files, it edits them
openai_agent.py Same pattern with OpenAI function calling
batch_edit.py Apply edits from JSON or XML files (with --dry-run and --validate)
# Claude agent: "add type hints to all functions in src/"
python examples/claude_agent.py --task "Add type hints" --files src/

# Batch edits from XML (natural LLM output)
echo '<edits><edit file="app.py"><old>def foo():</old><new>def foo() -> None:</new></edit></edits>' \
  | python examples/batch_edit.py --validate

Integration

HarnessKit is designed to slot into any agent framework as the edit backend:

import subprocess, json

def apply_edit(file, old_text, new_text):
    result = subprocess.run(
        ["hk", "apply", "--stdin"],
        input=json.dumps({"file": file, "old_text": old_text, "new_text": new_text}),
        capture_output=True, text=True
    )
    return json.loads(result.stdout)

Or import directly:

from hk import apply_edit

result = apply_edit("app.py", old_text, new_text, threshold=0.8)

Benchmarks

We tested HarnessKit against 50 realistic edit failure scenarios across 6 categories โ€” the kind that break str_replace and apply_patch in production agent workflows.

Category Cases Pass Rate Avg Confidence
Whitespace Mismatch (tabs/spaces, trailing, CRLF, vendor prefixes, commas) 12 100% 0.950
Stale Context (renames, decorators, type changes, docstrings, hooks) 13 100% 0.937
Partial Match (incomplete blocks, missing context, shell scripts) 7 100% 1.000
Indentation Drift (mixed tabs/spaces, YAML, Makefile, class methods) 7 100% 0.950
Line Number Off (shifted imports, functions, comments) 4 100% 1.000
Encoding Issues (Unicode, BOM, invisible chars, CRLF, smart quotes) 5 100% 1.000
Total 50 100% 0.960

50/50 benchmarks passing. Covers Python, TypeScript, Rust, Go, Java, Ruby, CSS, HTML, YAML, Makefile, Bash, and more.

Run the benchmarks yourself:

python3 benchmarks/run_benchmarks.py

GitHub Action

Use HarnessKit in your CI/CD pipelines:

- name: Apply fuzzy edit
  uses: alexmelges/harnesskit@v0.5
  with:
    edit-file: changes.json
    validate: true
    atomic: true

Or inline:

- name: Apply edit
  uses: alexmelges/harnesskit@v0.5
  with:
    edit-json: '{"file": "src/config.py", "old_text": "DEBUG = True", "new_text": "DEBUG = False"}'

The action outputs status, confidence, and match-type for downstream steps.

Pre-commit Hook

Add syntax validation to your pre-commit workflow:

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/alexmelges/harnesskit
    rev: v0.5.0
    hooks:
      - id: harnesskit-validate

How HarnessKit Compares

Feature HarnessKit str_replace (Anthropic) apply_patch (OpenAI) SWE-bench harness
Fuzzy matching โœ… 4-stage cascade โŒ Exact only โŒ Line numbers โŒ N/A
Dependencies Zero (stdlib) N/A (built-in) N/A (built-in) Heavy
Single file โœ… ~1300 LOC N/A N/A โŒ
Confidence score โœ… Per-edit โŒ โŒ โŒ
Syntax validation โœ… + auto-rollback โŒ โŒ โŒ
Undo/rollback โœ… Atomic โŒ โŒ โŒ
MCP server โœ… Built-in โŒ โŒ โŒ
Model-agnostic โœ… Any LLM Claude only GPT only N/A
XML + JSON input โœ… Auto-detect โŒ โŒ โŒ

TL;DR: HarnessKit is what str_replace and apply_patch should have been โ€” tolerant of the imprecision that every LLM exhibits.

Design Principles

  • Single file, stdlib only โ€” copy it, vendor it, pip install it. No dependency hell.
  • ~1250 lines of Python โ€” still small enough to audit in one sitting
  • Graceful degradation โ€” exact match when possible, fuzzy only when needed
  • Transparent โ€” every result tells you how it matched and how confident it is
  • Model-agnostic โ€” works with any LLM that can produce old/new text pairs

Post-Edit Validation

HarnessKit can verify that edits don't break your code's syntax โ€” and automatically rolls back if they do. No other edit tool does this.

# Validate after applying โ€” rollback on syntax error
hk apply --file app.py --old "x = 1" --new "x = 1 +" --validate
# โ†’ status: "validation_error", file unchanged

# Validate a file without editing
hk validate app.py
# โ†’ {"status": "valid", "file": "app.py"}

Supported languages (all stdlib, zero dependencies):

Extension Validator
.py compile() โ€” catches all Python syntax errors
.json json.loads() โ€” strict JSON validation
.xml, .html, .htm ElementTree โ€” XML/HTML parse check
.yaml, .yml yaml.safe_load() (if PyYAML installed)
.js, .ts, .jsx, .tsx Bracket/brace/paren balance + unclosed string detection
Other Always passes (no false positives)

Diff Output

See exactly what changed with unified diff output:

# Show diff on stderr (JSON still goes to stdout)
hk apply --file app.py --old "x = 1" --new "x = 2" --diff

# Preview changes without writing
hk apply --file app.py --old "x = 1" --new "x = 2" --diff --dry-run

Diff is also included in the JSON output ("diff" field) for programmatic use.

Create Files

Coding agents don't just edit โ€” they create files too:

# Create a new file
hk create --file src/utils.py --content "def helper(): pass"

# Fail if file exists (safe default)
hk create --file src/utils.py --content "..."
# โ†’ error: "File already exists"

# Overwrite with --force
hk create --file src/utils.py --content "..." --force

# Validate syntax before creating
hk create --file src/utils.py --content "def(" --validate
# โ†’ validation_error, file NOT created

# Read content from stdin
echo 'print("hello")' | hk create --file hello.py --stdin

Configuration

Flag Default Description
--threshold 0.8 Minimum similarity score for fuzzy matching
--dry-run false Preview changes without writing to disk
--validate false Validate syntax after edit (rollback on failure)
--diff false Print unified diff to stderr
--force false Overwrite existing file (create command)

Human-Friendly Output

When running in a terminal, HarnessKit auto-detects TTY and shows colored, readable output:

$ hk apply --file app.py --old "x = 1" --new "x = 2"
โœ… app.py
   Match: exact (100.0% confidence)

$ hk apply --file app.py --old "def foo" --new "def bar" --diff
โœ… app.py
   Match: fuzzy (92.3% confidence)
   -def foo():
   +def bar():

Force JSON output (for scripts): --format json Force human output (for pipes): --format human

Development

git clone https://github.com/alexmelges/harnesskit.git
cd harnesskit
python3 -m pytest test_hk.py test_mcp.py test_wrapper.py -v  # 137 tests

License

MIT โ€” see LICENSE.


Built for the agents that build everything else.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

harnesskit-0.5.0.tar.gz (27.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

harnesskit-0.5.0-py3-none-any.whl (28.1 kB view details)

Uploaded Python 3

File details

Details for the file harnesskit-0.5.0.tar.gz.

File metadata

  • Download URL: harnesskit-0.5.0.tar.gz
  • Upload date:
  • Size: 27.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for harnesskit-0.5.0.tar.gz
Algorithm Hash digest
SHA256 163b9731a13f97e2315e3bc4ac0348fd4ebf7b9ad0f221342f5d4c0319a5b3da
MD5 9ac1074c0a36307696f596dd54f5d28a
BLAKE2b-256 34ebc8b75d3ff4c53f454ab8a213165e27c938405250140b44dbb908f3ccc093

See more details on using hashes here.

File details

Details for the file harnesskit-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: harnesskit-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 28.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for harnesskit-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8a9996b2884d29bb2908b40d0b57acd4a07fdc1ad4d644629468b30f8ed03bfe
MD5 ac0e3d1bc8d2a45f7ec41f19d8843743
BLAKE2b-256 c0b5d4f8e46a8c5d69b5805060c13db03cb8cd79fe843dcc0ffbce7fab1d1203

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page