Skip to main content

Streaming JSON parser that yields progressively complete values - Python port of the TypeScript jsonriver library

Project description

jsonriver - Python Streaming JSON Parser

Parse JSON incrementally as it streams in, e.g. from a network request or a language model. Gives you a sequence of increasingly complete values.

This is a Python port of the TypeScript jsonriver library.

Features

  • Incremental parsing: Get progressively complete JSON values as data arrives
  • Zero dependencies: Uses only Python standard library
  • Fully typed: Complete type hints with mypy strict mode compliance
  • Memory efficient: Reuses objects and arrays when possible
  • Correct: Final result matches json.loads() exactly
  • Fast: Optimized for performance with minimal overhead

Installation

From PyPI (recommended)

Using uv:

uv add jsonriver

Using pip:

pip install jsonriver

From source

Using uv:

git clone https://github.com/chrisschnabl/streamjson.git
cd streamjson
uv pip install -e .

Using pip:

git clone https://github.com/chrisschnabl/streamjson.git
cd streamjson
pip install -e .

Usage

import asyncio
import json
from jsonriver import parse


async def make_stream(text: str, chunk_size: int):
    """Simulate a streaming source"""
    for i in range(0, len(text), chunk_size):
        yield text[i:i + chunk_size]


async def main():
    json_str = '{"name": "Alice", "age": 30}'

    stream = make_stream(json_str, chunk_size=3)
    async for value in parse(stream):
        print(json.dumps(value))
    # Output shows incremental results:
    # {}
    # {"name": "Al"}
    # {"name": "Alice"}
    # {"name": "Alice", "age": 30.0}


asyncio.run(main())

How it Works

jsonriver yields a sequence of increasingly complete JSON values. Consider this JSON:

{"name": "Alex", "keys": [1, 20, 300]}

If you parse this one byte at a time, it would yield:

{}
{"name": ""}
{"name": "A"}
{"name": "Al"}
{"name": "Ale"}
{"name": "Alex"}
{"name": "Alex", "keys": []}
{"name": "Alex", "keys": [1]}
{"name": "Alex", "keys": [1, 20]}
{"name": "Alex", "keys": [1, 20, 300]}

Invariants

The library maintains these guarantees:

  1. Type stability: Future versions will have the same type (never changes string → array)
  2. Atomic values: null, true, false, and numbers are only yielded when complete
  3. String growth: Strings may be replaced with longer versions
  4. Array append-only: Arrays only modified by appending or mutating the last element
  5. Object append-only: Objects only modified by adding properties or mutating the last one
  6. Complete keys: Object properties only added once key and value type are known

Error Handling

The parser throws errors for invalid JSON, matching json.loads() behavior:

async def example_error():
    try:
        stream = make_stream('{"invalid": }', 1)
        async for value in parse(stream):
            print(value)
    except ValueError as e:
        print(f"Parse error: {e}")

Development

Setup

# Create virtual environment and install dependencies
uv venv
uv pip install -e ".[dev]"

Testing

# Run all tests
python -m pytest tests/ -v

# Run specific test file
python -m pytest tests/test_parse.py -v

# Run with coverage
python -m pytest tests/ --cov=src/jsonriver

Type Checking

# Check types with mypy
mypy src/jsonriver --strict

Running Examples

python example_jsonriver.py

Project Structure

src/jsonriver/
  __init__.py       # Public API exports
  parse.py          # JSON parser implementation
  tokenize.py       # JSON tokenizer implementation

tests/
  test_parse.py     # Parser tests
  test_tokenize.py  # Tokenizer tests
  utils.py          # Test utilities

API Reference

parse(stream: AsyncIterator[str]) -> AsyncIterator[JsonValue]

Incrementally parse a single JSON value from the given iterable of string chunks.

Parameters:

  • stream: An async iterator that yields string chunks containing JSON data

Yields:

  • Increasingly complete JSON values as more input is parsed

Raises:

  • ValueError: If the input is not valid JSON
  • RuntimeError: For internal parsing errors

Example:

async def parse_json():
    json_str = '{"a": 1, "b": 2}'

    async def stream():
        for char in json_str:
            yield char

    async for value in parse(stream()):
        print(value)

Type Definitions

JsonValue = Union[
    None,
    bool,
    float,
    str,
    list['JsonValue'],
    dict[str, 'JsonValue']
]

JsonObject = dict[str, JsonValue]

Performance

jsonriver is designed for performance:

  • Processes input synchronously in batches when available
  • Reuses objects and arrays to minimize allocations
  • Minimal overhead compared to standard json.loads()
  • Efficient state machine implementation

In practice, jsonriver adds negligible overhead to the parsing process while providing valuable incremental updates.

Use Cases

  • Streaming APIs: Parse JSON from network requests as data arrives
  • Large payloads: Start processing data before complete response
  • Real-time UIs: Update UI as JSON parses
  • LLM responses: Parse structured output from language models
  • Progress indicators: Show parsing progress to users
  • Server-sent events: Handle JSON in SSE streams

Comparison with Alternatives

Feature jsonriver json.loads ijson
Incremental parsing
Complete values
No dependencies
Type hints
Memory efficient

License

BSD-3-Clause License

  • Original TypeScript implementation: Copyright (c) 2023 Google LLC
  • Python port: Copyright (c) 2024 jsonriver-python contributors

See LICENSE file for full license text.

Credits

This is a Python port of the excellent jsonriver TypeScript library by Peter Burns (@rictic).

Contributing

Contributions are welcome! Please ensure:

  1. All tests pass: pytest tests/ -v
  2. Type checking passes: mypy src/jsonriver --strict
  3. Code follows existing style
  4. New features include tests

Changelog

0.0.1 (2024)

  • Initial Python port from TypeScript
  • Full type hints with mypy strict mode
  • Comprehensive test suite (37 tests)
  • Complete documentation
  • Zero dependencies

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jsonriver-0.0.1.tar.gz (16.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jsonriver-0.0.1-py3-none-any.whl (12.5 kB view details)

Uploaded Python 3

File details

Details for the file jsonriver-0.0.1.tar.gz.

File metadata

  • Download URL: jsonriver-0.0.1.tar.gz
  • Upload date:
  • Size: 16.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.29

File hashes

Hashes for jsonriver-0.0.1.tar.gz
Algorithm Hash digest
SHA256 c41972f8b7fbb32b8ebdb7a890c14f291d1ae75a88ac613178a59b7780d8ba4d
MD5 63dab407a8126f94ff2a90a9ae9227ed
BLAKE2b-256 1a3e694ff88c0667c070815040020f077d682b36ecf3d7c6e447a046e4d77e8c

See more details on using hashes here.

File details

Details for the file jsonriver-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: jsonriver-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 12.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.29

File hashes

Hashes for jsonriver-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9139ed7199c800acd68b81dc478ea8f9bd5a0d1cdbc87d9fdea628ca931c22d5
MD5 782ce85bc6fcb838c3d88537d16d1227
BLAKE2b-256 d2e698c242b8d6dd40a6d83e2ba0780e8b2b88068fdeead0748893bb76da53d6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page