Fuzzy edit tool for LLM coding agents โ never fail a str_replace again
Project description
๐ง HarnessKit
Fuzzy edit tool for LLM coding agents โ never fail a
str_replaceagain.
The Problem
Every LLM coding agent has the same Achilles' heel: edit application.
When Claude, GPT, or any model tries to modify code, it generates an old_text โ new_text pair. The tool then does an exact string match to find where to apply the change. And it fails. A lot.
- Whitespace differences โ the model adds a space, drops a tab, or normalizes indentation
- Minor hallucinations โ a variable name is slightly off, a comment is paraphrased
- Format fragility โ diffs, patches, and line-number schemes all break in different ways
The result? Up to 50% edit failure rates on non-native models. Every failed edit wastes a tool call, burns tokens on retries, and breaks agent flow.
The Solution
HarnessKit (hk) is a drop-in edit tool that fuzzy-matches the old text before replacing it. It uses a 4-stage matching cascade:
- Exact match โ zero overhead when the model is precise
- Normalized whitespace โ catches the most common failure mode
- Sequence matching โ
difflib.SequenceMatcherwith configurable threshold (default 0.8) - Line-by-line fuzzy โ finds the best contiguous block match for heavily drifted edits
Every edit returns a confidence score and match type, so your agent knows exactly how the edit was resolved.
Quick Start
pip install harnesskit
Or just copy hk.py into your project โ it's a single file, stdlib only.
CLI Usage
# Direct arguments
hk apply --file app.py --old "def hello():\n print('hi')" --new "def hello():\n print('hello world')"
# JSON from stdin (perfect for tool_use integration)
echo '{"file": "app.py", "old_text": "def hello():", "new_text": "def greet():"}' | hk apply --stdin
# From a JSON file
hk apply --edit changes.json
# XML format (natural for Claude and other LLMs)
echo '<edit file="app.py"><old>def hello():</old><new>def greet():</new></edit>' | hk apply --stdin
# XML from file
hk apply --edit changes.xml
# Dry run โ see what would change without writing
hk apply --file app.py --old "..." --new "..." --dry-run
JSON Edit Format
{
"file": "path/to/file.py",
"old_text": "def hello():\n print('hi')",
"new_text": "def hello():\n print('hello world')"
}
Batch multiple edits:
{
"edits": [
{"file": "a.py", "old_text": "...", "new_text": "..."},
{"file": "b.py", "old_text": "...", "new_text": "..."}
]
}
XML Edit Format
HarnessKit auto-detects XML input โ ideal for LLMs that naturally output XML:
<edit file="path/to/file.py">
<old>def hello():
print('hi')</old>
<new>def hello():
print('hello world')</new>
</edit>
Batch multiple edits:
<edits>
<edit file="a.py"><old>...</old><new>...</new></edit>
<edit file="b.py"><old>...</old><new>...</new></edit>
</edits>
The path attribute works as an alias for file.
Output
{
"status": "applied",
"file": "app.py",
"match_type": "fuzzy",
"confidence": 0.92,
"matched_text": "def hello():\n print( 'hi' )"
}
Exit Codes
| Code | Meaning |
|---|---|
0 |
Edit applied successfully |
1 |
No match found |
2 |
Ambiguous โ multiple matches |
MCP Server
HarnessKit ships an MCP (Model Context Protocol) server for plug-and-play integration with any MCP-compatible agent.
Quick Start
Add to your MCP client config (e.g. Claude Desktop, Cursor, etc.):
{
"mcpServers": {
"harnesskit": {
"command": "python3",
"args": ["/path/to/hk_mcp.py"]
}
}
}
Tools
| Tool | Description |
|---|---|
harnesskit_apply |
Apply a fuzzy edit to a file (supports validate param) |
harnesskit_apply_batch |
Apply multiple edits in one call |
harnesskit_match |
Preview the match without modifying (dry run) |
harnesskit_create |
Create a new file (with optional validation) |
harnesskit_validate |
Validate a file's syntax without modifying it |
Each tool returns the match type, confidence score, and matched text โ giving the agent full visibility into how the edit was resolved.
Example
{
"name": "harnesskit_apply",
"arguments": {
"file": "app.py",
"old_text": "def hello():\n print('hi')",
"new_text": "def hello():\n print('hello world')",
"threshold": 0.8
}
}
Response:
{
"status": "applied",
"match_type": "whitespace",
"confidence": 0.95
}
Examples
The examples/ directory contains complete, working integration examples:
| Example | Description |
|---|---|
claude_agent.py |
Full Claude tool_use agent loop โ give it a task and files, it edits them |
openai_agent.py |
Same pattern with OpenAI function calling |
batch_edit.py |
Apply edits from JSON or XML files (with --dry-run and --validate) |
# Claude agent: "add type hints to all functions in src/"
python examples/claude_agent.py --task "Add type hints" --files src/
# Batch edits from XML (natural LLM output)
echo '<edits><edit file="app.py"><old>def foo():</old><new>def foo() -> None:</new></edit></edits>' \
| python examples/batch_edit.py --validate
Integration
HarnessKit is designed to slot into any agent framework as the edit backend:
import subprocess, json
def apply_edit(file, old_text, new_text):
result = subprocess.run(
["hk", "apply", "--stdin"],
input=json.dumps({"file": file, "old_text": old_text, "new_text": new_text}),
capture_output=True, text=True
)
return json.loads(result.stdout)
Or import directly:
from hk import apply_edit
result = apply_edit("app.py", old_text, new_text, threshold=0.8)
Benchmarks
We tested HarnessKit against 50 realistic edit failure scenarios across 6 categories โ the kind that break str_replace and apply_patch in production agent workflows.
| Category | Cases | Pass Rate | Avg Confidence |
|---|---|---|---|
| Whitespace Mismatch (tabs/spaces, trailing, CRLF, vendor prefixes, commas) | 12 | 100% | 0.950 |
| Stale Context (renames, decorators, type changes, docstrings, hooks) | 13 | 100% | 0.937 |
| Partial Match (incomplete blocks, missing context, shell scripts) | 7 | 100% | 1.000 |
| Indentation Drift (mixed tabs/spaces, YAML, Makefile, class methods) | 7 | 100% | 0.950 |
| Line Number Off (shifted imports, functions, comments) | 4 | 100% | 1.000 |
| Encoding Issues (Unicode, BOM, invisible chars, CRLF, smart quotes) | 5 | 100% | 1.000 |
| Total | 50 | 100% | 0.960 |
50/50 benchmarks passing. Covers Python, TypeScript, Rust, Go, Java, Ruby, CSS, HTML, YAML, Makefile, Bash, and more.
Run the benchmarks yourself:
python3 benchmarks/run_benchmarks.py
GitHub Action
Use HarnessKit in your CI/CD pipelines:
- name: Apply fuzzy edit
uses: alexmelges/harnesskit@v0.5
with:
edit-file: changes.json
validate: true
atomic: true
Or inline:
- name: Apply edit
uses: alexmelges/harnesskit@v0.5
with:
edit-json: '{"file": "src/config.py", "old_text": "DEBUG = True", "new_text": "DEBUG = False"}'
The action outputs status, confidence, and match-type for downstream steps.
Pre-commit Hook
Add syntax validation to your pre-commit workflow:
# .pre-commit-config.yaml
repos:
- repo: https://github.com/alexmelges/harnesskit
rev: v0.5.0
hooks:
- id: harnesskit-validate
How HarnessKit Compares
| Feature | HarnessKit | str_replace (Anthropic) |
apply_patch (OpenAI) |
SWE-bench harness |
|---|---|---|---|---|
| Fuzzy matching | โ 4-stage cascade | โ Exact only | โ Line numbers | โ N/A |
| Dependencies | Zero (stdlib) | N/A (built-in) | N/A (built-in) | Heavy |
| Single file | โ ~1300 LOC | N/A | N/A | โ |
| Confidence score | โ Per-edit | โ | โ | โ |
| Syntax validation | โ + auto-rollback | โ | โ | โ |
| Undo/rollback | โ Atomic | โ | โ | โ |
| MCP server | โ Built-in | โ | โ | โ |
| Model-agnostic | โ Any LLM | Claude only | GPT only | N/A |
| XML + JSON input | โ Auto-detect | โ | โ | โ |
TL;DR: HarnessKit is what str_replace and apply_patch should have been โ tolerant of the imprecision that every LLM exhibits.
Design Principles
- Single file, stdlib only โ copy it, vendor it, pip install it. No dependency hell.
- ~1250 lines of Python โ still small enough to audit in one sitting
- Graceful degradation โ exact match when possible, fuzzy only when needed
- Transparent โ every result tells you how it matched and how confident it is
- Model-agnostic โ works with any LLM that can produce old/new text pairs
Post-Edit Validation
HarnessKit can verify that edits don't break your code's syntax โ and automatically rolls back if they do. No other edit tool does this.
# Validate after applying โ rollback on syntax error
hk apply --file app.py --old "x = 1" --new "x = 1 +" --validate
# โ status: "validation_error", file unchanged
# Validate a file without editing
hk validate app.py
# โ {"status": "valid", "file": "app.py"}
Supported languages (all stdlib, zero dependencies):
| Extension | Validator |
|---|---|
.py |
compile() โ catches all Python syntax errors |
.json |
json.loads() โ strict JSON validation |
.xml, .html, .htm |
ElementTree โ XML/HTML parse check |
.yaml, .yml |
yaml.safe_load() (if PyYAML installed) |
.js, .ts, .jsx, .tsx |
Bracket/brace/paren balance + unclosed string detection |
| Other | Always passes (no false positives) |
Diff Output
See exactly what changed with unified diff output:
# Show diff on stderr (JSON still goes to stdout)
hk apply --file app.py --old "x = 1" --new "x = 2" --diff
# Preview changes without writing
hk apply --file app.py --old "x = 1" --new "x = 2" --diff --dry-run
Diff is also included in the JSON output ("diff" field) for programmatic use.
Create Files
Coding agents don't just edit โ they create files too:
# Create a new file
hk create --file src/utils.py --content "def helper(): pass"
# Fail if file exists (safe default)
hk create --file src/utils.py --content "..."
# โ error: "File already exists"
# Overwrite with --force
hk create --file src/utils.py --content "..." --force
# Validate syntax before creating
hk create --file src/utils.py --content "def(" --validate
# โ validation_error, file NOT created
# Read content from stdin
echo 'print("hello")' | hk create --file hello.py --stdin
Configuration
| Flag | Default | Description |
|---|---|---|
--threshold |
0.8 |
Minimum similarity score for fuzzy matching |
--dry-run |
false |
Preview changes without writing to disk |
--validate |
false |
Validate syntax after edit (rollback on failure) |
--diff |
false |
Print unified diff to stderr |
--force |
false |
Overwrite existing file (create command) |
Human-Friendly Output
When running in a terminal, HarnessKit auto-detects TTY and shows colored, readable output:
$ hk apply --file app.py --old "x = 1" --new "x = 2"
โ
app.py
Match: exact (100.0% confidence)
$ hk apply --file app.py --old "def foo" --new "def bar" --diff
โ
app.py
Match: fuzzy (92.3% confidence)
-def foo():
+def bar():
Force JSON output (for scripts): --format json
Force human output (for pipes): --format human
Development
git clone https://github.com/alexmelges/harnesskit.git
cd harnesskit
python3 -m pytest test_hk.py test_mcp.py test_wrapper.py -v # 137 tests
License
MIT โ see LICENSE.
Built for the agents that build everything else.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file harnesskit-0.5.0.tar.gz.
File metadata
- Download URL: harnesskit-0.5.0.tar.gz
- Upload date:
- Size: 27.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
163b9731a13f97e2315e3bc4ac0348fd4ebf7b9ad0f221342f5d4c0319a5b3da
|
|
| MD5 |
9ac1074c0a36307696f596dd54f5d28a
|
|
| BLAKE2b-256 |
34ebc8b75d3ff4c53f454ab8a213165e27c938405250140b44dbb908f3ccc093
|
File details
Details for the file harnesskit-0.5.0-py3-none-any.whl.
File metadata
- Download URL: harnesskit-0.5.0-py3-none-any.whl
- Upload date:
- Size: 28.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8a9996b2884d29bb2908b40d0b57acd4a07fdc1ad4d644629468b30f8ed03bfe
|
|
| MD5 |
ac0e3d1bc8d2a45f7ec41f19d8843743
|
|
| BLAKE2b-256 |
c0b5d4f8e46a8c5d69b5805060c13db03cb8cd79fe843dcc0ffbce7fab1d1203
|