Skip to main content

Streaming JSON parser that yields progressively complete values - Python port of the TypeScript jsonriver library

Project description

jsonriver - Python Streaming JSON Parser

Parse JSON incrementally as it streams in, e.g. from a network request or a language model. Gives you a sequence of increasingly complete values.

This is a Python port of the TypeScript jsonriver library.

Features

  • Incremental parsing: Get progressively complete JSON values as data arrives
  • Zero dependencies: Uses only Python standard library
  • Fully typed: Complete type hints with mypy strict mode compliance
  • Memory efficient: Reuses objects and arrays when possible
  • Correct: Final result matches json.loads() exactly
  • Fast: Optimized for performance with minimal overhead

Installation

From PyPI (recommended)

Using uv:

uv add jsonriver

Using pip:

pip install jsonriver

From source

Using uv:

git clone https://github.com/yourusername/jsonriver-python.git
cd jsonriver-python
uv pip install -e .

Using pip:

git clone https://github.com/yourusername/jsonriver-python.git
cd jsonriver-python
pip install -e .

Usage

import asyncio
import json
from jsonriver import parse


async def make_stream(text: str, chunk_size: int):
    """Simulate a streaming source"""
    for i in range(0, len(text), chunk_size):
        yield text[i:i + chunk_size]


async def main():
    json_str = '{"name": "Alice", "age": 30}'

    stream = make_stream(json_str, chunk_size=3)
    async for value in parse(stream):
        print(json.dumps(value))
    # Output shows incremental results:
    # {}
    # {"name": "Al"}
    # {"name": "Alice"}
    # {"name": "Alice", "age": 30.0}


asyncio.run(main())

How it Works

jsonriver yields a sequence of increasingly complete JSON values. Consider this JSON:

{"name": "Alex", "keys": [1, 20, 300]}

If you parse this one byte at a time, it would yield:

{}
{"name": ""}
{"name": "A"}
{"name": "Al"}
{"name": "Ale"}
{"name": "Alex"}
{"name": "Alex", "keys": []}
{"name": "Alex", "keys": [1]}
{"name": "Alex", "keys": [1, 20]}
{"name": "Alex", "keys": [1, 20, 300]}

Invariants

The library maintains these guarantees:

  1. Type stability: Future versions will have the same type (never changes string → array)
  2. Atomic values: null, true, false, and numbers are only yielded when complete
  3. String growth: Strings may be replaced with longer versions
  4. Array append-only: Arrays only modified by appending or mutating the last element
  5. Object append-only: Objects only modified by adding properties or mutating the last one
  6. Complete keys: Object properties only added once key and value type are known

Error Handling

The parser throws errors for invalid JSON, matching json.loads() behavior:

async def example_error():
    try:
        stream = make_stream('{"invalid": }', 1)
        async for value in parse(stream):
            print(value)
    except ValueError as e:
        print(f"Parse error: {e}")

Development

Setup

# Create virtual environment and install dependencies
uv venv
uv pip install -e ".[dev]"

Testing

# Run all tests
python -m pytest tests/ -v

# Run specific test file
python -m pytest tests/test_parse.py -v

# Run with coverage
python -m pytest tests/ --cov=src/jsonriver

Type Checking

# Check types with mypy
mypy src/jsonriver --strict

Running Examples

python example_jsonriver.py

Project Structure

src/jsonriver/
  __init__.py       # Public API exports
  parse.py          # JSON parser implementation
  tokenize.py       # JSON tokenizer implementation

tests/
  test_parse.py     # Parser tests
  test_tokenize.py  # Tokenizer tests
  utils.py          # Test utilities

API Reference

parse(stream: AsyncIterator[str]) -> AsyncIterator[JsonValue]

Incrementally parse a single JSON value from the given iterable of string chunks.

Parameters:

  • stream: An async iterator that yields string chunks containing JSON data

Yields:

  • Increasingly complete JSON values as more input is parsed

Raises:

  • ValueError: If the input is not valid JSON
  • RuntimeError: For internal parsing errors

Example:

async def parse_json():
    json_str = '{"a": 1, "b": 2}'

    async def stream():
        for char in json_str:
            yield char

    async for value in parse(stream()):
        print(value)

Type Definitions

JsonValue = Union[
    None,
    bool,
    float,
    str,
    list['JsonValue'],
    dict[str, 'JsonValue']
]

JsonObject = dict[str, JsonValue]

Performance

jsonriver is designed for performance:

  • Processes input synchronously in batches when available
  • Reuses objects and arrays to minimize allocations
  • Minimal overhead compared to standard json.loads()
  • Efficient state machine implementation

In practice, jsonriver adds negligible overhead to the parsing process while providing valuable incremental updates.

Use Cases

  • Streaming APIs: Parse JSON from network requests as data arrives
  • Large payloads: Start processing data before complete response
  • Real-time UIs: Update UI as JSON parses
  • LLM responses: Parse structured output from language models
  • Progress indicators: Show parsing progress to users
  • Server-sent events: Handle JSON in SSE streams

Comparison with Alternatives

Feature jsonriver json.loads ijson
Incremental parsing
Complete values
No dependencies
Type hints
Memory efficient

License

BSD-3-Clause License

  • Original TypeScript implementation: Copyright (c) 2023 Google LLC
  • Python port: Copyright (c) 2024 jsonriver-python contributors

See LICENSE file for full license text.

Credits

This is a Python port of the excellent jsonriver TypeScript library by Peter Burns (@rictic).

Contributing

Contributions are welcome! Please ensure:

  1. All tests pass: pytest tests/ -v
  2. Type checking passes: mypy src/jsonriver --strict
  3. Code follows existing style
  4. New features include tests

Changelog

1.0.0 (2024)

  • Initial Python port from TypeScript
  • Full type hints with mypy strict mode
  • Comprehensive test suite (28 tests)
  • Complete documentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jsonriver-1.0.0.tar.gz (16.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jsonriver-1.0.0-py3-none-any.whl (12.5 kB view details)

Uploaded Python 3

File details

Details for the file jsonriver-1.0.0.tar.gz.

File metadata

  • Download URL: jsonriver-1.0.0.tar.gz
  • Upload date:
  • Size: 16.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.29

File hashes

Hashes for jsonriver-1.0.0.tar.gz
Algorithm Hash digest
SHA256 a7efd023e8ce35fb522ecaa7d4a4446cf092857315f27a1fbe4fa4751c194092
MD5 ab93f982011588521c42d40baa769926
BLAKE2b-256 10ac613a6a6d63e0283ebe6120f1a6f66b7e82ed2690d60d65feb481dce7afda

See more details on using hashes here.

File details

Details for the file jsonriver-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: jsonriver-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 12.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.29

File hashes

Hashes for jsonriver-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 aed518de1d7e55eb6791cdfd23bf929bce22b02e215754217574b6b5111ceee5
MD5 f282a28bae55f74c6d083fde0046b41d
BLAKE2b-256 ea05d29c6f0e9b50b1ad596287aebda1980ba423fae43bf7c0509c8f53c247e3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page