Fix typos in JSON keys using fuzzy matching with RapidFuzz
Project description
Fuzzy JSON Repair
You ask an LLM for JSON, it gives you {"nam": "John", "emal": "john@example.com"} instead of {"name": "John", "email": "john@example.com"}. Your Pydantic validation fails. You spend an hour writing error handling code.
This library fixes those typos automatically using fuzzy string matching. No more manual key mapping, no more try-except blocks everywhere.
Install
pip install fuzzy-json-repair
If you're processing lots of data, install with numpy for ~10x faster batch processing:
pip install fuzzy-json-repair[fast]
Usage
The simplest way - repair and validate in one go:
from pydantic import BaseModel
from fuzzy_json_repair import fuzzy_model_validate_json
class User(BaseModel):
name: str
age: int
email: str
# Your LLM gave you this
json_str = '{"nam": "John", "agge": 30, "emal": "john@example.com"}'
# This just works
user = fuzzy_model_validate_json(json_str, User)
print(user) # User(name='John', age=30, email='john@example.com')
Or if you want more control:
from fuzzy_json_repair import repair_keys
schema = User.model_json_schema()
data = {'nam': 'John', 'agge': 30, 'emal': 'john@example.com'}
repaired, error_ratio, errors = repair_keys(data, schema)
if repaired is None:
print(f"Too many errors: {errors}")
else:
user = User.model_validate(repaired)
Advanced Usage
Nested Objects
class Address(BaseModel):
street: str
city: str
zip_code: str
class Person(BaseModel):
name: str
address: Address
data = {
'nam': 'John',
'addres': {
'stret': '123 Main St',
'cty': 'NYC',
'zip_cod': '10001'
}
}
schema = Person.model_json_schema()
repaired, _, _ = repair_keys(data, schema, max_error_ratio_per_key=0.5)
# All nested typos are fixed!
person = Person.model_validate(repaired)
Lists of Objects
class Product(BaseModel):
product_id: str
name: str
price: float
class Cart(BaseModel):
cart_id: str
products: list[Product]
data = {
'cart_idd': 'C123',
'prodcts': [
{'product_idd': 'P1', 'nam': 'Laptop', 'pric': 999.99},
{'product_idd': 'P2', 'nam': 'Mouse', 'pric': 29.99}
]
}
schema = Cart.model_json_schema()
repaired, _, _ = repair_keys(data, schema, max_error_ratio_per_key=0.5)
# Repairs all typos in the list items too!
cart = Cart.model_validate(repaired)
Complex Nested Structures
class Customer(BaseModel):
customer_id: str
name: str
email: str
class Order(BaseModel):
order_id: str
customer: Customer
products: list[Product]
total: float
# Works with arbitrarily complex nesting!
json_str = '''
{
"order_idd": "ORD-123",
"custmer": {
"customer_idd": "C-001",
"nam": "John",
"emal": "john@example.com"
},
"prodcts": [
{"product_idd": "P-001", "nam": "Laptop", "pric": 1299.99}
],
"totl": 1299.99
}
'''
order = fuzzy_model_validate_json(
json_str,
Order,
max_total_error_ratio=2.0 # Allow higher error ratio for complex structures
)
API Reference
repair_keys(data, json_schema, max_error_ratio_per_key=0.3, max_total_error_ratio=0.5, strict_validation=False)
Repair dictionary keys using fuzzy matching against a JSON schema.
Parameters:
data(dict): Input dictionary with potential typosjson_schema(dict): JSON schema frommodel.model_json_schema()max_error_ratio_per_key(float): Maximum error ratio per individual key (0.0-1.0). Default: 0.3max_total_error_ratio(float): Maximum average error ratio across all schema fields (0.0-1.0). Default: 0.5strict_validation(bool): If True, reject unrecognized keys. Default: False
Returns:
tuple[dict | None, float, list[RepairError]]: (repaired_data, total_error_ratio, errors)repaired_dataisNoneif repair exceeds acceptable thresholdstotal_error_ratioanderrorsare always returned for diagnostics
Example:
schema = User.model_json_schema()
repaired, ratio, errors = repair_keys(data, schema)
if repaired is None:
print("Repair failed - too many errors")
else:
user = User.model_validate(repaired)
fuzzy_model_validate_json(json_data, model_cls, repair_syntax=True, max_error_ratio_per_key=0.3, max_total_error_ratio=0.3, strict_validation=False)
Repair JSON string and return validated Pydantic model instance.
Parameters:
json_data(str): JSON string to repairmodel_cls(type[BaseModel]): Pydantic model classrepair_syntax(bool): Attempt to fix JSON syntax errors. Default: True (requires json-repair)max_error_ratio_per_key(float): Max error per individual key. Default: 0.3max_total_error_ratio(float): Max average error across all fields. Default: 0.3strict_validation(bool): Reject unrecognized keys. Default: False
Returns:
BaseModel: Validated Pydantic model instance
Raises:
ValueError: If repair fails or validation fails
Example:
user = fuzzy_model_validate_json(json_str, User)
Error Types
from fuzzy_json_repair import ErrorType, RepairError
# ErrorType enum:
ErrorType.misspelled_key # Typo was fixed
ErrorType.unrecognized_key # Unknown key (kept if not strict)
ErrorType.missing_expected_key # Required field missing
# RepairError dataclass:
error = RepairError(
error_type=ErrorType.misspelled_key,
from_key='nam',
to_key='name',
error_ratio=0.143,
message=None
)
print(error)
# "Misspelled key 'nam' → 'name' (error: 14.3%)"
Configuration
Error Ratio Thresholds
# Strict (only very close matches)
repair_keys(data, schema, max_error_ratio_per_key=0.2)
# Moderate (default, good for most cases)
repair_keys(data, schema, max_error_ratio_per_key=0.3)
# Lenient (fix even poor matches)
repair_keys(data, schema, max_error_ratio_per_key=0.5)
Strict Validation
# Reject unrecognized keys
repaired, _, _ = repair_keys(data, schema, strict_validation=True)
Performance
The library uses two matching strategies:
- With numpy: Uses
process.cdist()for batch processing (10-20x faster) - Without numpy: Uses
process.extractOne()loop (still fast)
Both use fuzz.ratio from RapidFuzz - no raw Levenshtein distance anywhere.
Benchmark (1000 repairs):
- With numpy: ~0.05s
- Without numpy: ~0.5s
Install with pip install fuzzy-json-repair[fast] for best performance.
How It Works
- Schema Extraction: Extracts expected keys, nested schemas, and
$refdefinitions from Pydantic's JSON schema - Exact Matching: Processes keys that match exactly (fast path)
- Fuzzy Matching: For typos, uses RapidFuzz's
fuzz.ratioto find best match - Batch Processing: Computes all similarities at once with
cdist(when numpy available) - Recursive Repair: Automatically handles nested objects and lists
- Validation: Returns repaired data ready for Pydantic validation
Use Cases
- LLM Output Validation: Fix typos in JSON generated by language models
- API Integration: Handle variations in third-party API responses
- Data Migration: Repair legacy data with inconsistent field names
- User Input: Correct typos in user-provided configuration files
- Robust Parsing: Build fault-tolerant JSON parsers
Requirements
- Python 3.11+
- pydantic >= 2.0.0
- rapidfuzz >= 3.0.0
Optional:
- numpy >= 1.20.0 (for faster batch processing)
- json-repair >= 0.7.0 (for JSON syntax repair)
Development
# Clone repository
git clone https://github.com/sayef/fuzzy-json-repair.git
cd fuzzy-json-repair
# Install with dev dependencies
pip install -e ".[dev,fast,syntax]"
# Run tests
pytest
# Run tests with coverage
pytest --cov=fuzzy_json_repair --cov-report=term-missing
# Format code
black fuzzy_json_repair tests
isort fuzzy_json_repair tests
# Type check
mypy fuzzy_json_repair
# Lint
ruff check fuzzy_json_repair tests
License
MIT License - see LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Credits
- Uses RapidFuzz for fast fuzzy matching
- Built for Pydantic integration
- Optional json-repair support
Changelog
0.1.0 (2025-01-25)
- Initial release
- Core fuzzy matching with RapidFuzz
- Pydantic integration
- Nested object and list support
- Batch processing with cdist
- Comprehensive test suite
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fuzzy_json_repair-0.1.1.tar.gz.
File metadata
- Download URL: fuzzy_json_repair-0.1.1.tar.gz
- Upload date:
- Size: 15.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f5602e392f5364ba1a61b274ed1699635d314327342b6c810d35d12f5a455dfe
|
|
| MD5 |
fc0e8a2a5b37afcfe57df9f8f86a4322
|
|
| BLAKE2b-256 |
5514eaecff162d9ce112eae1f4d3e33c13c0d8a4c79613aa0679f3218269069b
|
Provenance
The following attestation bundles were made for fuzzy_json_repair-0.1.1.tar.gz:
Publisher:
publish.yml on sayef/fuzzy-json-repair
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fuzzy_json_repair-0.1.1.tar.gz -
Subject digest:
f5602e392f5364ba1a61b274ed1699635d314327342b6c810d35d12f5a455dfe - Sigstore transparency entry: 640609645
- Sigstore integration time:
-
Permalink:
sayef/fuzzy-json-repair@dece080dca033204737fb7a342e474b94b9b849d -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/sayef
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@dece080dca033204737fb7a342e474b94b9b849d -
Trigger Event:
push
-
Statement type:
File details
Details for the file fuzzy_json_repair-0.1.1-py3-none-any.whl.
File metadata
- Download URL: fuzzy_json_repair-0.1.1-py3-none-any.whl
- Upload date:
- Size: 10.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6476e80566024081f1cde1751d91133cafa39d359ed8358cec628999a731b121
|
|
| MD5 |
a286c24e7815b490745a81ae9c25a2db
|
|
| BLAKE2b-256 |
81bbd15398e13849518010464f6ba27042b753a06a01ebba3a2faa384ea88e04
|
Provenance
The following attestation bundles were made for fuzzy_json_repair-0.1.1-py3-none-any.whl:
Publisher:
publish.yml on sayef/fuzzy-json-repair
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fuzzy_json_repair-0.1.1-py3-none-any.whl -
Subject digest:
6476e80566024081f1cde1751d91133cafa39d359ed8358cec628999a731b121 - Sigstore transparency entry: 640609647
- Sigstore integration time:
-
Permalink:
sayef/fuzzy-json-repair@dece080dca033204737fb7a342e474b94b9b849d -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/sayef
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@dece080dca033204737fb7a342e474b94b9b849d -
Trigger Event:
push
-
Statement type: