Skip to main content

Fixes broken JSON string objects

Project description

fix-busted-json

Fix broken json using Python.

For Python 3.6+.

This project fixes broken JSON with the following issues:

  • Missing quotes around key names
  • Wrong quotes around key names and strings
    • Single quotes
    • Backticks
    • Escaped double quote
    • Double escaped double quote
    • "Smart" i.e. curly quotes
  • Missing commas between key-value pairs and array elements
  • Trailing comma after last key-value pair
  • Concatenation of string fields
  • Replace Python True/False/None with JSON true/false/null
  • Remove additional double quote at start of key that gpt-3.5-turbo sometimes adds
  • Escape unescaped newline \n in string value
  • Deal with many escaping la-la land cases e.g. {\"res\": \"{ \\\"a\\\": \\\"b\\\" }\"}

Utility functions are also provided for finding JSON objects in text.

https://github.com/Qarj/fix-busted-json

https://pypi.org/project/fix-busted-json

Quickstart

pip install fix-busted-json

Make a file called example_repair_json.py:

#!/usr/bin/env python3

from fix_busted_json import repair_json

invalid_json = "{ name: 'John' 'age': 30, 'city': 'New' + ' York', }"

fixed_json = repair_json(invalid_json)

print(fixed_json)

Note the issues in the invalid JSON:

  • name is unquoted
  • use of single quotes, JSON spec requires double quotes
  • Missing comma
  • Concatenation of string fields - not allowed in JSON
  • Trailing comma

Run it:

python example_repair_json.py

Output:

{ "name": "John", "age": 30, "city": "New York" }

Why

The project was developed originally to find JSON like objects in log files and pretty print them.

More recently this project has been used to find and then fix broken JSON created by large language models such as gpt-3.5-turbo and gpt-4.

For example a large language model might output a completion like the following:

Thought: "I need to search for developer jobs in London"
Action: SearchTool
ActionInput: { location: "London", 'title': "developer" }

To get back this JSON object with this project is really easy:

#!/usr/bin/env python3

from fix_busted_json import first_json

completion = """Thought: "I need to search for developer jobs in London"
Action: SearchTool
ActionInput: { location: "London", 'title': "developer" }
"""

print(first_json(completion))

Output:

{ "location": "London", "title": "developer" }

API

repair_json

#!/usr/bin/env python3

from fix_busted_json import repair_json

invalid_json = "{ name: 'John' }"

fixed_json = repair_json(invalid_json)

log_jsons

Looks for JSON objects in text and logs them, also recursively logging any JSON objects found in the values of the top-level JSON object.

#!/usr/bin/env python3

from fix_busted_json import log_jsons

log_jsons("""some text { key1: true, 'key2': "  { inner: 'value', } " } text { a: 1 } text""")

Running it gives output:

some text
{
  "key1": true,
  "key2": "  { inner: 'value', } "
}

FOUND JSON found in key key2 --->

{
  "inner": "value"
}


 text
{
  "a": 1
}
 text

to_array_of_plain_strings_or_json

Breaks text into an array of plain strings and JSON objects.

#!/usr/bin/env python3

from fix_busted_json import to_array_of_plain_strings_or_json

result = to_array_of_plain_strings_or_json("""some text { key1: true, 'key2': "  { inner: 'value', } " } text { a: 1 } text""")

print(result)

Gives output:

['some text ', '{ "key1": true, "key2": "  { inner: \'value\', } " }', ' text ', '{ "a": 1 }', ' text']

first_json, last_json, largest_json, json_matching

Utility functions for finding JSON objects in text.

#!/usr/bin/env python3
import re
from fix_busted_json import first_json, last_json, largest_json, json_matching

jsons = "text { first: 123 } etc { second_example: 456 } etc { third: 789 } { fourth: 12 }"

print(first_json(jsons))
print(last_json(jsons))
print(largest_json(jsons))
print(json_matching(jsons, re.compile("thi")))

Output:

{ "first": 123 }
{ "fourth": 12 }
{ "second_example": 456 }
{ "third": 789 }

See also

Node version of this project: https://www.npmjs.com/package/log-parsed-json

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fix_busted_json-0.0.19.tar.gz (12.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fix_busted_json-0.0.19-py3-none-any.whl (8.2 kB view details)

Uploaded Python 3

File details

Details for the file fix_busted_json-0.0.19.tar.gz.

File metadata

  • Download URL: fix_busted_json-0.0.19.tar.gz
  • Upload date:
  • Size: 12.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for fix_busted_json-0.0.19.tar.gz
Algorithm Hash digest
SHA256 07e586d879a8ccfd893a9970296a8fdee3c4c8c784d884e3df35d6e75e7c4e9d
MD5 5098a81ff56e22ad78fdb4d1d0069cfb
BLAKE2b-256 4f2c38ebf6fc3c01cba6a209db048a9981834fef29e4f224645c93d8f2f4ac69

See more details on using hashes here.

File details

Details for the file fix_busted_json-0.0.19-py3-none-any.whl.

File metadata

File hashes

Hashes for fix_busted_json-0.0.19-py3-none-any.whl
Algorithm Hash digest
SHA256 2803cf3b51f96f19500ecd2dabb4f2eaca526067092ba85aaa279d9520b8f7ef
MD5 8d10c61d7fce92f7f5a57ebd5a4bd1f5
BLAKE2b-256 27eade0f0e534ee8530e7f48e2aaa2ea36e9ef3a19f21c08b2425bd7ad3c7b64

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page