Skip to main content

A lightweight, dependency-free streaming (SAX-style) JSON parser.

Project description

jsonsax

Read JSON while it is still arriving — don't wait for the whole thing.

Imagine someone is reading you a long story out loud, one word at a time. You don't wait for them to finish the whole book before you start listening — you react to each part as you hear it. jsonsax does that for JSON.

Normally a computer waits for the entire JSON to show up, then reads it. jsonsax is different: you hand it little pieces as they arrive, and it taps you on the shoulder and says "hey, I just found a name!", "hey, here's a number!" — right away, piece by piece.

This style of reading-as-you-go is called a streaming (or SAX-style) parser. (XML has had one for years; this is the same idea for JSON.)

Why would you want that?

  • 🤖 Talking to an AI — chatbots send their answer one word at a time. With jsonsax you can start using the first part of the answer before the rest has even arrived.
  • 🐘 Huge files — a JSON file too big to fit in memory? Read it in small sips instead of swallowing it whole.
  • Show things sooner — display the title of an article the instant it appears, without waiting for the whole article.

Install

pip install jsonsax

That's it. No other stuff gets installed — jsonsax has zero dependencies.


The tiniest example

from jsonsax import parse

parse('{"name": "Bo", "age": 5}', value=lambda path, val: print(path, "=", val))

Output:

$.name = Bo
$.age = 5

$ means "the start". $.name means "the name part". Think of it as an address that tells you where in the JSON you are.


Feeding it bit by bit (the fun part)

Real streams don't arrive all at once. Watch what happens when the JSON shows up in messy little chunks — even cut in the middle of a word:

from jsonsax import Parser

parser = Parser()
parser.on("value", lambda path, val: print("found:", path, "=", val))

chunks = ['{"tit', 'le": "R', 'AG", "sco', 're": 9.5}']
for chunk in chunks:
    parser.feed(chunk)   # hand over one piece at a time
parser.close()           # tell it "okay, that's everything"

Output:

found: $.title = RAG
found: $.score = 9.5

Even though "title" got chopped into "tit" + "le", jsonsax patiently stitched it back together. 🧩


Listening for different things ("events")

You tell jsonsax what you care about with parser.on(...). Each time it sees that kind of thing, it calls your little function (a callback).

from jsonsax import Parser

parser = Parser()
parser.on("start_object", lambda path: print(path, "{ ... an object starts"))
parser.on("end_object",   lambda path: print(path, "} ... an object ends"))
parser.on("start_array",  lambda path: print(path, "[ ... a list starts"))
parser.on("end_array",    lambda path: print(path, "] ... a list ends"))
parser.on("key",          lambda path, key: print(path, "key:", key))
parser.on("value",        lambda path, val: print(path, "value:", repr(val)))

parse_me = '{"pets": ["cat", "dog"], "happy": true}'
for ch in parse_me:
    parser.feed(ch)
parser.close()

Output:

$ { ... an object starts
$.pets key: pets
$.pets [ ... a list starts
$.pets[0] value: 'cat'
$.pets[1] value: 'dog'
$.pets ] ... a list ends
$.happy key: happy
$.happy value: True
$ } ... an object ends

See how $.pets[0] and $.pets[1] count the items in the list, just like "first pet" and "second pet"?


The events you can listen for

Event You get… Happens when it sees…
start_object path a { — an object is starting
end_object path a } — an object is finished
start_array path a [ — a list is starting
end_array path a ] — a list is finished
key path, key a label inside an object (like name)
value path, value a real value: text, number, true/false/null

A value can be a str, an int, a float, True, False, or None (JSON's null becomes Python's None).


More examples (little recipes)

1. Grab just one field, ignore everything else

Only want the title? Only listen for it:

from jsonsax import Parser

def on_value(path, val):
    if path == "$.title":
        print("The title is:", val)

p = Parser()
p.on("value", on_value)
p.feed('{"title": "Hello", "body": "long boring text..."}')
p.close()
# The title is: Hello

2. Build a normal dictionary as you go

from jsonsax import Parser

data = {}
p = Parser()
p.on("value", lambda path, val: data.__setitem__(path, val))
p.feed('{"a": 1, "b": 2, "c": 3}')
p.close()
print(data)
# {'$.a': 1, '$.b': 2, '$.c': 3}

3. Count the items in a list

from jsonsax import Parser

count = 0
def bump(path, val):
    global count
    count += 1

p = Parser()
p.on("value", bump)
p.feed('[10, 20, 30, 40, 50]')
p.close()
print("items:", count)   # items: 5

4. Deeply nested stuff is no problem

from jsonsax import parse

parse(
    '{"user": {"name": "Mia", "tags": ["a", "b"]}}',
    value=lambda path, val: print(path, "=", val),
)
# $.user.name = Mia
# $.user.tags[0] = a
# $.user.tags[1] = b

5. All the value types at once

from jsonsax import parse

parse(
    '{"text": "hi", "whole": 42, "decimal": 3.14, "yes": true, "no": false, "nothing": null}',
    value=lambda path, val: print(f"{path:14} -> {val!r}"),
)
# $.text         -> 'hi'
# $.whole        -> 42
# $.decimal      -> 3.14
# $.yes          -> True
# $.no           -> False
# $.nothing      -> None

6. Reacting to an AI that types its answer slowly

This is the big one. Pretend an AI sends its reply word-by-word:

from jsonsax import Parser

# These pieces would normally come from the AI, one at a time.
ai_stream = ['{"head', 'line": "Big New', 's!", "summary": "It happened today."}']

p = Parser()
p.on("value", lambda path, val: print(f"[{path}] arrived: {val}"))

for piece in ai_stream:
    p.feed(piece)        # the moment a field finishes, you hear about it
p.close()
# [$.headline] arrived: Big News!
# [$.summary] arrived: It happened today.

7. Chain your setup in one breath

on(...) hands you the parser back, so you can line them up:

from jsonsax import Parser

p = (
    Parser()
    .on("key", lambda path, k: print("key", k))
    .on("value", lambda path, v: print("value", v))
)
p.feed('{"x": 1}')
p.close()

When the JSON is broken

If the JSON is messy or unfinished, jsonsax tells you by raising a ParseError (which is just a special kind of Python ValueError). It is strict on purpose — better to shout early than to quietly hand you wrong data.

from jsonsax import parse, ParseError

broken_examples = [
    '{"a": 1,}',     # extra comma at the end
    '[1, 2',         # forgot to close the list
    '{"a" 1}',       # missing the ':' between key and value
    '"never ends',   # string with no closing quote
    'true false',    # two things glued together
]

for bad in broken_examples:
    try:
        parse(bad)
    except ParseError as error:
        print("rejected:", bad, "->", error)

Output (your wording may vary slightly):

rejected: {"a": 1,} -> Unexpected '}'.
rejected: [1, 2 -> Unexpected end of input: unclosed container.
rejected: {"a" 1} -> Unexpected value (parser state: obj_colon).
rejected: "never ends -> Unexpected end of input: unterminated string.
rejected: true false -> Unexpected value (parser state: done).

Always call parser.close() at the end. That's the moment jsonsax double-checks that the JSON was actually complete. Forgetting it means you might miss the "you're missing the last }!" warning.


Run it from the terminal (no code needed)

You can pipe JSON straight into jsonsax to watch the events scroll by:

echo '{"x": [1, 2, true]}' | python -m jsonsax

Output:

$                        {
$.x                      key='x'
$.x                      [
$.x[0]                   value=1
$.x[1]                   value=2
$.x[2]                   value=True
$.x                      ]
$                        }

The whole toolbox (quick reference)

from jsonsax import Parser, parse, ParseError, EVENTS

parser = Parser()           # make a new reader
parser.on(event, callback)  # "when you see <event>, call <callback>" (returns parser)
parser.feed(chunk)          # give it the next piece of text
parser.close()              # "that's all" — checks the JSON was complete
parser.closed               # True after a successful close()

parse(text, **handlers)     # shortcut: feed + close in one line
ParseError                  # raised when the JSON is broken (a ValueError)
EVENTS                      # the tuple of all valid event names

Good to know:

  • Truly incremental — chunks can split anywhere, even in the middle of a word, a number, or a \uXXXX escape.
  • Strict — rejects trailing commas, missing colons, leftover junk, and unfinished strings or brackets.
  • Typed — ships with py.typed, so type checkers understand it.
  • Tiny & dependency-free, works on Python 3.9+.

For developers (working on jsonsax itself)

pip install -e ".[dev]"
pytest             # run the tests
pylint src/jsonsax # check the style
mypy               # check the types

License

MIT — free to use, change, and share.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jsonsax-0.1.0.tar.gz (11.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jsonsax-0.1.0-py3-none-any.whl (11.6 kB view details)

Uploaded Python 3

File details

Details for the file jsonsax-0.1.0.tar.gz.

File metadata

  • Download URL: jsonsax-0.1.0.tar.gz
  • Upload date:
  • Size: 11.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for jsonsax-0.1.0.tar.gz
Algorithm Hash digest
SHA256 971c9578e87116c564a7d7f089c30438a9cdcba52a04b0c8ae43c18732368413
MD5 1ecf98dc58f402b32d182eebd97a9414
BLAKE2b-256 ca7aeb3ee5d71e2852cebdc8c1fced8c39490eb8a6b8d1b33e8df353f8e17128

See more details on using hashes here.

File details

Details for the file jsonsax-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: jsonsax-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 11.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for jsonsax-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9f847dd20b4781f6a93e020597453a45bf878a0cbd4187769bf28f21cf8ab39e
MD5 e5f42c1f04640c7d34e9abf9a7402f85
BLAKE2b-256 ffb7e581341cbd80edf9ba1b5c3cccb60eef0364152860e42281efdc2fe7d00e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page