A package to repair broken json strings

These details have not been verified by PyPI

Project links

Project description

Python version

This simple package can be used to fix an invalid json string. To know all cases in which this package will work, check out the unit test.

banner

Offer me a beer

If you find this library useful, you can help me by donating toward my monthly beer budget here: https://github.com/sponsors/mangiucugna

Demo

If you are unsure if this library will fix your specific problem, or simply want your json validated online, you can visit the demo site on GitHub pages: https://mangiucugna.github.io/json_repair/

Or hear an audio deepdive generate by Google's NotebookLM for an introduction to the module

Motivation

Some LLMs are a bit iffy when it comes to returning well formed JSON data, sometimes they skip a parentheses and sometimes they add some words in it, because that's what an LLM does. Luckily, the mistakes LLMs make are simple enough to be fixed without destroying the content.

I searched for a lightweight python package that was able to reliably fix this problem but couldn't find any.

So I wrote one

Wouldn't GPT-4o Structured Output make this library outdated?

As part of my job we use OpenAI APIs and we noticed that even with structured output sometimes the result isn't a fully valid json. So we still use this library to cover those outliers.

Supported use cases

Fixing Syntax Errors in JSON

Missing quotes, misplaced commas, unescaped characters, and incomplete key-value pairs.
Missing quotation marks, improperly formatted values (true, false, null), and repairs corrupted key-value structures.

Repairing Malformed JSON Arrays and Objects

Incomplete or broken arrays/objects by adding necessary elements (e.g., commas, brackets) or default values (null, "").
The library can process JSON that includes extra non-JSON characters like comments or improperly placed characters, cleaning them up while maintaining valid structure.

Auto-Completion for Missing JSON Values

Automatically completes missing values in JSON fields with reasonable defaults (like empty strings or null), ensuring validity.

How to use

Install the library with pip

pip install json-repair

then you can use use it in your code like this

from json_repair import repair_json

good_json_string = repair_json(bad_json_string)
# If the string was super broken this will return an empty string

You can use this library to completely replace json.loads():

import json_repair

decoded_object = json_repair.loads(json_string)

or just

import json_repair

decoded_object = json_repair.repair_json(json_string, return_objects=True)

Avoid this antipattern

Some users of this library adopt the following pattern:

obj = {}
try:
    obj = json.loads(string)
except json.JSONDecodeError as e:
    obj = json_repair.loads(string)
    ...

This is wasteful because json_repair will already verify for you if the JSON is valid, if you still want to do that then add skip_json_loads=True to the call as explained the section below.

Read json from a file or file descriptor

JSON repair provides also a drop-in replacement for json.load():

import json_repair

try:
    file_descriptor = open(fname, 'rb')
except OSError:
    ...

with file_descriptor:
    decoded_object = json_repair.load(file_descriptor)

and another method to read from a file:

import json_repair

try:
    decoded_object = json_repair.from_file(json_file)
except OSError:
    ...
except IOError:
    ...

Keep in mind that the library will not catch any IO-related exception and those will need to be managed by you

Non-Latin characters

When working with non-Latin characters (such as Chinese, Japanese, or Korean), you need to pass ensure_ascii=False to repair_json() in order to preserve the non-Latin characters in the output.

Here's an example using Chinese characters:

repair_json("{'test_chinese_ascii':'统一码'}")

will return

{"test_chinese_ascii": "\u7edf\u4e00\u7801"}

Instead passing ensure_ascii=False:

repair_json("{'test_chinese_ascii':'统一码'}", ensure_ascii=False)

will return

{"test_chinese_ascii": "统一码"}

JSON dumps parameters

More in general, repair_json will accept all parameters that json.dumps accepts and just pass them through (for example indent)

Performance considerations

If you find this library too slow because is using json.loads() you can skip that by passing skip_json_loads=True to repair_json. Like:

from json_repair import repair_json

good_json_string = repair_json(bad_json_string, skip_json_loads=True)

I made a choice of not using any fast json library to avoid having any external dependency, so that anybody can use it regardless of their stack.

Some rules of thumb to use:

Setting return_objects=True will always be faster because the parser returns an object already and it doesn't have serialize that object to JSON
skip_json_loads is faster only if you 100% know that the string is not a valid JSON
If you are having issues with escaping pass the string as raw string like: r"string with escaping\""

Use json_repair with streaming

Sometimes you are streaming some data and want to repair the JSON coming from it. Normally this won't work but you can pass stream_stable to repair_json() or loads() to make it work:

stream_output = repair_json(stream_input, stream_stable=True)

Use json_repair from CLI

Install the library for command-line with:

pipx install json-repair

to know all options available:

$ json_repair -h
usage: json_repair [-h] [-i] [-o TARGET] [--ensure_ascii] [--indent INDENT] [filename]

Repair and parse JSON files.

positional arguments:
  filename              The JSON file to repair (if omitted, reads from stdin)

options:
  -h, --help            show this help message and exit
  -i, --inline          Replace the file inline instead of returning the output to stdout
  -o TARGET, --output TARGET
                        If specified, the output will be written to TARGET filename instead of stdout
  --ensure_ascii        Pass ensure_ascii=True to json.dumps()
  --indent INDENT       Number of spaces for indentation (Default 2)

Adding to requirements

Please pin this library only on the major version!

We use TDD and strict semantic versioning, there will be frequent updates and no breaking changes in minor and patch versions. To ensure that you only pin the major version of this library in your requirements.txt, specify the package name followed by the major version and a wildcard for minor and patch versions. For example:

json_repair==0.*

In this example, any version that starts with 0. will be acceptable, allowing for updates on minor and patch versions.

How to cite

If you are using this library in your academic work (as I know many folks are) please find the BibTex here:

@software{Baccianella_JSON_Repair_-_2025,
    author  = "Stefano {Baccianella}",
    month   = "feb",
    title   = "JSON Repair - A python module to repair invalid JSON, commonly used to parse the output of LLMs",
    url     = "https://github.com/mangiucugna/json_repair",
    version = "0.39.1",
    year    = 2025
}

Thank you for citing my work and please send me a link to the paper if you can!

How it works

This module will parse the JSON file following the BNF definition:

<json> ::= <primitive> | <container>

<primitive> ::= <number> | <string> | <boolean>
; Where:
; <number> is a valid real number expressed in one of a number of given formats
; <string> is a string of valid characters enclosed in quotes
; <boolean> is one of the literal strings 'true', 'false', or 'null' (unquoted)

<container> ::= <object> | <array>
<array> ::= '[' [ <json> *(', ' <json>) ] ']' ; A sequence of JSON values separated by commas
<object> ::= '{' [ <member> *(', ' <member>) ] '}' ; A sequence of 'members'
<member> ::= <string> ': ' <json> ; A pair consisting of a name, and a JSON value

If something is wrong (a missing parentheses or quotes for example) it will use a few simple heuristics to fix the JSON string:

Add the missing parentheses if the parser believes that the array or object should be closed
Quote strings or add missing single quotes
Adjust whitespaces and remove line breaks

I am sure some corner cases will be missing, if you have examples please open an issue or even better push a PR

How to develop

Just create a virtual environment with requirements.txt, the setup uses pre-commit to make sure all tests are run.

Make sure that the Github Actions running after pushing a new commit don't fail as well.

How to release

You will need owner access to this repository

Edit pyproject.toml and update the version number appropriately using semver notation
Commit and push all changes to the repository before continuing or the next steps will fail
Run python -m build
Create a new release in Github, making sure to tag all the issues solved and contributors. Create the new tag, same as the one in the build configuration
Once the release is created, a new Github Actions workflow will start to publish on Pypi, make sure it didn't fail

Repair JSON in other programming languages

Typescript: https://github.com/josdejong/jsonrepair
Go: https://github.com/RealAlexandreAI/json-repair
Ruby: https://github.com/sashazykov/json-repair-rb
Rust: https://github.com/oramasearch/llm_json
R: https://github.com/cgxjdzz/jsonRepair

Star History

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.47.6

Jul 1, 2025

0.47.5

Jun 30, 2025

0.47.4

Jun 27, 2025

0.47.3

Jun 24, 2025

0.47.2

Jun 23, 2025

0.47.1

Jun 19, 2025

0.47.0

Jun 19, 2025

0.46.2

Jun 6, 2025

0.46.1

Jun 4, 2025

0.46.0

May 22, 2025

0.45.1

May 21, 2025

0.45.0 yanked

May 20, 2025

Reason this release was yanked:

it fails for 3.9 users

0.44.1

Apr 30, 2025

0.44.0

Apr 29, 2025

0.43.0

Apr 28, 2025

0.42.0

Apr 22, 2025

0.41.1

Apr 14, 2025

0.41.0

Apr 10, 2025

0.40.0

Mar 19, 2025

0.39.1

Feb 23, 2025

0.39.0

Feb 18, 2025

0.38.0

Feb 17, 2025

0.37.0

Feb 16, 2025

0.36.1

Feb 13, 2025

0.36.0

Feb 11, 2025

0.35.0

Dec 31, 2024

0.34.0

Dec 26, 2024

0.33.0

Dec 23, 2024

0.32.0

Dec 18, 2024

0.31.0

Dec 13, 2024

0.30.3

Dec 4, 2024

0.30.2

Nov 14, 2024

0.30.1

Nov 5, 2024

0.30.0

Oct 9, 2024

0.29.10

Oct 7, 2024

0.29.9

Oct 7, 2024

0.29.8

Oct 4, 2024

0.29.7

Sep 29, 2024

0.29.6

Sep 28, 2024

0.29.5

Sep 26, 2024

0.29.4

Sep 22, 2024

0.29.3

Sep 22, 2024

0.29.2

Sep 9, 2024

0.29.1

Sep 5, 2024

0.29.0

Sep 4, 2024

0.28.4

Aug 28, 2024

0.28.3

Aug 19, 2024

0.28.2

Aug 19, 2024

0.28.1

Aug 19, 2024

0.28.0

Aug 16, 2024

0.27.2

Aug 11, 2024

0.27.1

Aug 11, 2024

0.27.0

Aug 8, 2024

0.26.0

Aug 2, 2024

0.25.3

Jul 10, 2024

0.25.2

Jun 27, 2024

0.25.1

Jun 20, 2024

0.25.0

Jun 19, 2024

0.24.0

Jun 18, 2024

0.23.1

Jun 2, 2024

0.23.0

Jun 2, 2024

0.22.0

Jun 1, 2024

0.21.0

May 30, 2024

0.20.1

May 26, 2024

0.20.0

May 25, 2024

0.19.2

May 21, 2024

0.19.1

May 13, 2024

0.19.0

May 12, 2024

0.18.0

May 9, 2024

0.17.4

May 8, 2024

0.17.3

May 7, 2024

0.17.2

May 7, 2024

0.17.1

May 6, 2024

0.17.0

May 3, 2024

0.16.3

Apr 30, 2024

0.16.2

Apr 30, 2024

0.16.1

Apr 30, 2024

0.16.0

Apr 29, 2024

0.15.6

Apr 29, 2024

0.15.5

Apr 28, 2024

0.15.4

Apr 28, 2024

0.15.3

Apr 25, 2024

0.15.2

Apr 23, 2024

0.15.1

Apr 23, 2024

0.15.0

Apr 21, 2024

0.14.0

Apr 19, 2024

0.13.1

Apr 18, 2024

0.13.0

Apr 11, 2024

0.12.3

Apr 10, 2024

0.12.2

Apr 9, 2024

0.12.1

Apr 8, 2024

0.12.0

Apr 8, 2024

0.11.1

Apr 2, 2024

0.11.0

Apr 1, 2024

0.10.1

Mar 6, 2024

0.10.0

Mar 6, 2024

0.9.0

Feb 24, 2024

0.8.1

Feb 12, 2024

0.8.0

Jan 28, 2024

0.7.0

Jan 27, 2024

0.6.2

Jan 24, 2024

0.6.1

Jan 23, 2024

0.6.0

Jan 22, 2024

0.5.1

Jan 18, 2024

0.5.0

Jan 16, 2024

0.4.5

Dec 6, 2023

0.4.4

Dec 5, 2023

0.4.3

Nov 22, 2023

0.4.2

Nov 22, 2023

0.4.1

Nov 20, 2023

0.4.0

Nov 20, 2023

0.3.0

Nov 16, 2023

0.2.0

Oct 17, 2023

0.1.10

Oct 16, 2023

0.1.9

Oct 16, 2023

0.1.8

Oct 11, 2023

0.1.7

Sep 8, 2023

0.1.6

Sep 7, 2023

0.1.5

Sep 7, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

json_repair-0.47.6.tar.gz (34.4 kB view details)

Uploaded Jul 1, 2025 Source

Built Distribution

json_repair-0.47.6-py3-none-any.whl (25.8 kB view details)

Uploaded Jul 1, 2025 Python 3

File details

Details for the file json_repair-0.47.6.tar.gz.

File metadata

Download URL: json_repair-0.47.6.tar.gz
Upload date: Jul 1, 2025
Size: 34.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.23

File hashes

Hashes for json_repair-0.47.6.tar.gz
Algorithm	Hash digest
SHA256	`4af5a14b9291d4d005a11537bae5a6b7912376d7584795f0ac1b23724b999620`
MD5	`b03f87aee61b019b0c151d663d5ceec7`
BLAKE2b-256	`ae9ee8bcda4fd47b16fcd4f545af258d56ba337fa43b847beb213818d7641515`

See more details on using hashes here.

File details

Details for the file json_repair-0.47.6-py3-none-any.whl.

File metadata

Download URL: json_repair-0.47.6-py3-none-any.whl
Upload date: Jul 1, 2025
Size: 25.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.23

File hashes

Hashes for json_repair-0.47.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1c9da58fb6240f99b8405f63534e08f8402793f09074dea25800a0b232d4fb19`
MD5	`4c173f53f7ad0282b7f5242add29ca4f`
BLAKE2b-256	`bbf8f464ce2afc4be5decf53d0171c2d399d9ee6cd70d2273b8e85e7c6d00324`

See more details on using hashes here.

json-repair 0.47.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Offer me a beer

Demo

Motivation

Wouldn't GPT-4o Structured Output make this library outdated?

Supported use cases

Fixing Syntax Errors in JSON

Repairing Malformed JSON Arrays and Objects

Auto-Completion for Missing JSON Values

How to use

Avoid this antipattern

Read json from a file or file descriptor

Non-Latin characters

JSON dumps parameters

Performance considerations

Use json_repair with streaming

Use json_repair from CLI

Adding to requirements

How to cite

How it works

How to develop

How to release

Repair JSON in other programming languages

Star History

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes