A package to repair broken json strings
Project description
This simple package can be used to repair a broken json file. To know all cases in which this package will work, check out the unit test.
Inspired by https://github.com/josdejong/jsonrepair with contributions by GPT-4
Motivation
I was using GPT a lot and there is no sure fire way to get structured output out of it. You can ask for a JSON output or use the Functions paradigm, either way the documentation from OpenAI clearly states that it might not return a valid JSON. Luckily, the mistakes GPT makes are simple enough to be fixed without destroying the content. I searched for a lightweight python package but couldn't find any.
So I wrote this one.
You can look how I used it by checking out this demo: https://huggingface.co/spaces/mangiucugna/difficult-conversations-bot/
How to use
from json_repair import repair_json
good_json_string = repair_json(bad_json_string)
How it works
This module will parse the JSON file following the BNF definition:
<json> ::= <primitive> | <container>
<primitive> ::= <number> | <string> | <boolean>
; Where:
; <number> is a valid real number expressed in one of a number of given formats
; <string> is a string of valid characters enclosed in quotes
; <boolean> is one of the literal strings 'true', 'false', or 'null' (unquoted)
<container> ::= <object> | <array>
<array> ::= '[' [ <json> *(', ' <json>) ] ']' ; A sequence of JSON values separated by commas
<object> ::= '{' [ <member> *(', ' <member>) ] '}' ; A sequence of 'members'
<member> ::= <string> ': ' <json> ; A pair consisting of a name, and a JSON value
If something is wrong (a missing parantheses or quotes for example) it will use a few simple heuristics to fix the JSON string:
- Add the missing parentheses if the parser believes that the array or object should be closed
- Quote strings or add missing single quotes
- Adjust whitespaces and remove line breaks
I am sure some corner cases will be missing, if you have examples please open an issue or even better push a PR
How to develop
Just create a virtual environment with requirements.txt
, the setup uses pre-commit to make sure all tests are run
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for json_repair-0.1.8-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2bd6f44b301a7115ca5314221ccaae78b6a6aec280c351aac7dc67079192b605 |
|
MD5 | cca1ab252596deca2b68794f576a2cdc |
|
BLAKE2b-256 | 9d88da5bf7649cb101c6db14f314cc02bdc9b31aacadce1363ed8dd421ae50ba |