Fix typos in JSON keys using fuzzy matching with RapidFuzz

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

Fuzzy JSON Repair

You ask an LLM for JSON, it gives you {"nam": "John", "emal": "john@example.com"} instead of {"name": "John", "email": "john@example.com"}. Your Pydantic validation fails. You spend an hour writing error handling code.

This library fixes those typos automatically using fuzzy string matching. No more manual key mapping, no more try-except blocks everywhere.

Install

pip install fuzzy-json-repair

This includes JSON syntax repair automatically. For additional features:

With Pydantic (for fuzzy_model_validate_json):

pip install fuzzy-json-repair[pydantic]

With NumPy (~10x faster batch processing):

pip install fuzzy-json-repair[fast]

All extras:

pip install fuzzy-json-repair[pydantic,fast]

Usage

The simplest way - repair and validate in one go:

from pydantic import BaseModel
from fuzzy_json_repair import fuzzy_model_validate_json

class User(BaseModel):
    name: str
    age: int
    email: str

# Your LLM gave you this
json_str = '{"nam": "John", "agge": 30, "emal": "john@example.com"}'

# This just works
user = fuzzy_model_validate_json(json_str, User)
print(user)  # User(name='John', age=30, email='john@example.com')

Or if you want more control:

from fuzzy_json_repair import repair_keys

schema = User.model_json_schema()
data = {'nam': 'John', 'agge': 30, 'emal': 'john@example.com'}

result = repair_keys(data, schema)

if result.success:
    user = User.model_validate(result.data)
else:
    print(f"Repair failed: {len(result.errors)} errors")
    print(f"Error ratio: {result.error_ratio:.2%}")

Advanced Usage

Nested Objects

class Address(BaseModel):
    street: str
    city: str
    zip_code: str

class Person(BaseModel):
    name: str
    address: Address

data = {
    'nam': 'John',
    'addres': {
        'stret': '123 Main St',
        'cty': 'NYC',
        'zip_cod': '10001'
    }
}

schema = Person.model_json_schema()
result = repair_keys(data, schema, max_error_ratio_per_key=0.5)

# All nested typos are fixed!
if result.success:
    person = Person.model_validate(result.data)

Lists of Objects

class Product(BaseModel):
    product_id: str
    name: str
    price: float

class Cart(BaseModel):
    cart_id: str
    products: list[Product]

data = {
    'cart_idd': 'C123',
    'prodcts': [
        {'product_idd': 'P1', 'nam': 'Laptop', 'pric': 999.99},
        {'product_idd': 'P2', 'nam': 'Mouse', 'pric': 29.99}
    ]
}

schema = Cart.model_json_schema()
result = repair_keys(data, schema, max_error_ratio_per_key=0.5)

# Repairs all typos in the list items too!
if result.success:
    cart = Cart.model_validate(result.data)

Drop Unrepairable Items

Drop list items, unrecognized keys, and optional nested objects that are beyond repair. Automatically respects minItems constraints and preserves required fields:

class Product(BaseModel):
    name: str
    price: float

class Cart(BaseModel):
    items: list[Product]

data = {
    'items': [
        {'nam': 'Laptop', 'pric': 999},        # Repairable
        {'completely': 'wrong', 'keys': 123},  # Beyond repair
        {'nme': 'Mouse', 'prce': 29}           # Repairable
    ]
}

schema = Cart.model_json_schema()
result = repair_keys(
    data, schema,
    drop_unrepairable_items=True  # Drop items that can't be fixed
)

if result.success:
    # Returns 2 items (dropped the broken one)
    print(len(result.data['items']))  # 2

Works with nested structures too:

class Order(BaseModel):
    order_id: str
    products: list[Product]

class Customer(BaseModel):
    name: str
    orders: list[Order]

# Drops unrepairable items at any nesting level
result = repair_keys(
    data, schema,
    drop_unrepairable_items=True
)
if result.success:
    use(result.data)

Complex Nested Structures

class Customer(BaseModel):
    customer_id: str
    name: str
    email: str

class Order(BaseModel):
    order_id: str
    customer: Customer
    products: list[Product]
    total: float

# Works with arbitrarily complex nesting!
json_str = '''
{
    "order_idd": "ORD-123",
    "custmer": {
        "customer_idd": "C-001",
        "nam": "John",
        "emal": "john@example.com"
    },
    "prodcts": [
        {"product_idd": "P-001", "nam": "Laptop", "pric": 1299.99}
    ],
    "totl": 1299.99
}
'''

order = fuzzy_model_validate_json(
    json_str,
    Order,
    max_total_error_ratio=2.0  # Allow higher error ratio for complex structures
)

API Reference

`repair_keys(data, json_schema, max_error_ratio_per_key=0.3, max_total_error_ratio=0.5, strict_validation=False, drop_unrepairable_items=False)`

Repair dictionary keys using fuzzy matching against a JSON schema.

Parameters:

data (dict): Input dictionary with potential typos
json_schema (dict): JSON schema from model.model_json_schema()
max_error_ratio_per_key (float): Maximum error ratio per individual key (0.0-1.0). Default: 0.3
max_total_error_ratio (float): Maximum average error ratio across all schema fields (0.0-1.0). Default: 0.5
strict_validation (bool): If True, reject unrecognized keys. Default: False
drop_unrepairable_items (bool): If True, drop list items, unrecognized keys, and optional nested objects that can't be repaired (respects minItems, preserves required fields). Default: False

Returns:

RepairResult: Object with:
- success (bool): Whether repair succeeded
- data (dict | None): Repaired data (None if failed)
- error_ratio (float): Total error ratio
- errors (list[RepairError]): List of errors encountered

Example:

schema = User.model_json_schema()
result = repair_keys(data, schema)
if result.success:
    user = User.model_validate(result.data)
else:
    print(f"Repair failed: {len(result.errors)} errors")

`fuzzy_model_validate_json(json_data, model_cls, max_error_ratio_per_key=0.3, max_total_error_ratio=0.3, strict_validation=False, drop_unrepairable_items=False)`

Repair JSON string and return validated Pydantic model instance. Automatically attempts JSON syntax repair when json-repair is available.

Parameters:

json_data (str): JSON string to repair
model_cls (type[BaseModel]): Pydantic model class
max_error_ratio_per_key (float): Max error per individual key. Default: 0.3
max_total_error_ratio (float): Max average error across all fields. Default: 0.3
strict_validation (bool): Reject unrecognized keys. Default: False
drop_unrepairable_items (bool): Drop list items, unrecognized keys, and optional nested objects that can't be repaired (respects minItems, preserves required fields). Default: False

Returns:

BaseModel: Validated Pydantic model instance

Raises:

RepairFailedError: If repair fails or validation fails (provides structured access to repair details)

Example:

from fuzzy_json_repair import fuzzy_model_validate_json, RepairFailedError

try:
    user = fuzzy_model_validate_json(json_str, User)
except RepairFailedError as e:
    print(f"Repair failed: {e}")
    print(f"Errors: {len(e.errors)}")
    print(f"Unrepaired errors: {len(e.unrepaired_errors)}")
    for error in e.errors:
        print(f"  [{error.path or 'root'}] {error}")

Error Types

from fuzzy_json_repair import ErrorType, RepairError, RepairResult

# ErrorType enum:
ErrorType.misspelled_key       # Typo was fixed
ErrorType.unrecognized_key     # Unknown key (kept if not strict)
ErrorType.missing_expected_key  # Required field missing

# RepairError dataclass:
error = RepairError(
    error_type=ErrorType.misspelled_key,
    from_key='nam',
    to_key='name',
    error_ratio=0.143,
)
print(error)
# "Misspelled key 'nam' → 'name' (error: 14.3%)"

# RepairResult dataclass:
result = RepairResult(
    success=True,
    data={'name': 'John', 'age': 30},
    error_ratio=0.15,
    errors=[error]
)
print(f"Success: {result.success}")
print(f"Misspelled: {len(result.misspelled_keys)}")
print(f"Failed: {result.failed}")

Configuration

Error Ratio Thresholds

# Strict (only very close matches)
repair_keys(data, schema, max_error_ratio_per_key=0.2)

# Moderate (default, good for most cases)
repair_keys(data, schema, max_error_ratio_per_key=0.3)

# Lenient (fix even poor matches)
repair_keys(data, schema, max_error_ratio_per_key=0.5)

Strict Validation

# Reject unrecognized keys
result = repair_keys(data, schema, strict_validation=True)
if result.success:
    use(result.data)

Drop Unrepairable Items

# Drop list items that exceed error thresholds
result = repair_keys(
    data, schema,
    drop_unrepairable_items=True
)

# Respects minItems constraints
from pydantic import Field

class Cart(BaseModel):
    items: list[Product] = Field(min_length=2)

# If dropping would violate minItems=2, repair fails
result = repair_keys(data, schema, drop_unrepairable_items=True)
if not result.success:
    print("Would violate minItems constraint")

Performance

The library uses multiple optimization strategies:

JSON Parsing: Uses Pydantic's Rust-powered parser (TypeAdapter) when available (~22% faster than json.loads)
Fuzzy Matching with numpy: Uses process.cdist() for batch processing (10-20x faster)
Fuzzy Matching without numpy: Uses process.extractOne() loop (still fast)

Both fuzzy matching strategies use fuzz.ratio from RapidFuzz - no raw Levenshtein distance anywhere.

Benchmark (1000 repairs):

With numpy: ~0.05s
Without numpy: ~0.5s

Install with pip install fuzzy-json-repair[pydantic,fast] for best performance.

How It Works

Schema Extraction: Extracts expected keys, nested schemas, and $ref definitions from Pydantic's JSON schema
Exact Matching: Processes keys that match exactly (fast path)
Fuzzy Matching: For typos, uses RapidFuzz's fuzz.ratio to find best match
Batch Processing: Computes all similarities at once with cdist (when numpy available)
Recursive Repair: Automatically handles nested objects and lists
Validation: Returns repaired data ready for Pydantic validation

Use Cases

LLM Output Validation: Fix typos in JSON generated by language models
API Integration: Handle variations in third-party API responses
Data Migration: Repair legacy data with inconsistent field names
User Input: Correct typos in user-provided configuration files
Robust Parsing: Build fault-tolerant JSON parsers

Requirements

Python 3.11+
rapidfuzz >= 3.0.0
json-repair >= 0.7.0

Optional:

pydantic >= 2.0.0 (for fuzzy_model_validate_json, install with [pydantic])
numpy >= 1.20.0 (for faster batch processing, install with [fast])

Development

# Clone repository
git clone https://github.com/sayef/fuzzy-json-repair.git
cd fuzzy-json-repair

# Install with dev dependencies
pip install -e ".[dev,pydantic,fast]"

# Run tests
pytest

# Run tests with coverage
pytest --cov=fuzzy_json_repair --cov-report=term-missing

# Format code
black fuzzy_json_repair tests
isort fuzzy_json_repair tests

# Type check
mypy fuzzy_json_repair

# Lint
ruff check fuzzy_json_repair tests

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Credits

Uses RapidFuzz for fast fuzzy matching
Built for Pydantic integration
Optional json-repair support

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

sayef

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.5

Oct 31, 2025

0.1.4

Oct 26, 2025

0.1.3

Oct 25, 2025

0.1.2

Oct 25, 2025

0.1.1

Oct 25, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fuzzy_json_repair-0.1.5.tar.gz (20.9 kB view details)

Uploaded Oct 31, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fuzzy_json_repair-0.1.5-py3-none-any.whl (14.6 kB view details)

Uploaded Oct 31, 2025 Python 3

File details

Details for the file fuzzy_json_repair-0.1.5.tar.gz.

File metadata

Download URL: fuzzy_json_repair-0.1.5.tar.gz
Upload date: Oct 31, 2025
Size: 20.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fuzzy_json_repair-0.1.5.tar.gz
Algorithm	Hash digest
SHA256	`a7e0f7de4176c73cf8ba86da57e84756d58fcd61736c27fd9386cf93d838b805`
MD5	`f8dcdc1919d50999f16fc170142e4e34`
BLAKE2b-256	`76c92f17eb8c23f09fbe07431ff742af9d72b8ded36f5480d6b94e8bd5680d76`

See more details on using hashes here.

Provenance

The following attestation bundles were made for fuzzy_json_repair-0.1.5.tar.gz:

Publisher: publish.yml on sayef/fuzzy-json-repair

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: fuzzy_json_repair-0.1.5.tar.gz
- Subject digest: a7e0f7de4176c73cf8ba86da57e84756d58fcd61736c27fd9386cf93d838b805
- Sigstore transparency entry: 659372831
- Sigstore integration time: Oct 31, 2025
Source repository:
- Permalink: sayef/fuzzy-json-repair@5e15c26e071a9b71a2654383550c922c5686c0e9
- Branch / Tag: refs/tags/v0.1.5
- Owner: https://github.com/sayef
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@5e15c26e071a9b71a2654383550c922c5686c0e9
- Trigger Event: push

File details

Details for the file fuzzy_json_repair-0.1.5-py3-none-any.whl.

File metadata

Download URL: fuzzy_json_repair-0.1.5-py3-none-any.whl
Upload date: Oct 31, 2025
Size: 14.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fuzzy_json_repair-0.1.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`77a5814caf3bbd0b533469a64c2b5ee07294dbcfba4f8bf8d204acabf21c1e3a`
MD5	`911eb7d46b518a19c96087bf48798ce9`
BLAKE2b-256	`185aa490fabfdebeb77cf3f2248b3ed6735488417232d8766797d8ef82b2a3c8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for fuzzy_json_repair-0.1.5-py3-none-any.whl:

Publisher: publish.yml on sayef/fuzzy-json-repair

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: fuzzy_json_repair-0.1.5-py3-none-any.whl
- Subject digest: 77a5814caf3bbd0b533469a64c2b5ee07294dbcfba4f8bf8d204acabf21c1e3a
- Sigstore transparency entry: 659372840
- Sigstore integration time: Oct 31, 2025
Source repository:
- Permalink: sayef/fuzzy-json-repair@5e15c26e071a9b71a2654383550c922c5686c0e9
- Branch / Tag: refs/tags/v0.1.5
- Owner: https://github.com/sayef
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@5e15c26e071a9b71a2654383550c922c5686c0e9
- Trigger Event: push

fuzzy-json-repair 0.1.5

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Fuzzy JSON Repair

Install

Usage

Advanced Usage

Nested Objects

Lists of Objects

Drop Unrepairable Items

Complex Nested Structures

API Reference

repair_keys(data, json_schema, max_error_ratio_per_key=0.3, max_total_error_ratio=0.5, strict_validation=False, drop_unrepairable_items=False)

fuzzy_model_validate_json(json_data, model_cls, max_error_ratio_per_key=0.3, max_total_error_ratio=0.3, strict_validation=False, drop_unrepairable_items=False)

Error Types

Configuration

Error Ratio Thresholds

Strict Validation

Drop Unrepairable Items

Performance

How It Works

Use Cases

Requirements

Development

License

Contributing

Credits

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`repair_keys(data, json_schema, max_error_ratio_per_key=0.3, max_total_error_ratio=0.5, strict_validation=False, drop_unrepairable_items=False)`

`fuzzy_model_validate_json(json_data, model_cls, max_error_ratio_per_key=0.3, max_total_error_ratio=0.3, strict_validation=False, drop_unrepairable_items=False)`