Skip to main content

A SAX-style JSON parser for processing incomplete JSON streams

Project description

jaxn

A SAX-style JSON parser for processing incomplete JSON streams character-by-character.

Overview

jaxn is a lightweight streaming JSON parser that processes JSON incrementally as it arrives, similar to how SAX parsers work with XML. Instead of waiting for the complete JSON document, jaxn fires callbacks as it encounters different parts of the JSON structure, making it perfect for:

  • Real-time streaming applications (e.g., LLM responses, API streams)
  • Processing large JSON files without loading them entirely into memory
  • Displaying content as it arrives rather than waiting for complete responses
  • Building responsive UIs that update progressively

Installation

pip install jaxn

Quick Start

Here's a simple example that prints field values as they're parsed:

from jaxn import StreamingJSONParser, JSONParserHandler

class SimpleHandler(JSONParserHandler):
    def on_field_end(self, path, field_name, value, parsed_value=None):
        print(f"{field_name}: {value}")

handler = SimpleHandler()
parser = StreamingJSONParser(handler)

# Process JSON incrementally
json_data = '{"name": "Alice", "age": 30}'
parser.parse_incremental(json_data)

Detailed Example: Streaming Markdown Renderer

This example shows how to convert a streaming JSON response into formatted markdown output in real-time. The example is based on the demo in the demo/ directory.

The JSON Structure

{
    "title": "Monitoring Data Drift in Production",
    "sections": [
        {
            "heading": "Overview",
            "content": "Monitoring data drift is crucial...",
            "references": [
                {
                    "title": "Data Drift",
                    "filename": "metrics/preset_data_drift.mdx"
                }
            ]
        }
    ]
}

The Handler Implementation

from pathlib import Path
from jaxn import StreamingJSONParser, JSONParserHandler
import time

class SearchResultHandler(JSONParserHandler):
    def on_field_start(self, path: str, field_name: str):
        # Print references header when we encounter a references array
        if field_name == "references":
            level = path.count("/") + 2
            print(f"\n{'#' * level} References\n")

    def on_field_end(self, path, field_name, value, parsed_value=None):
        # Print title as main heading
        if field_name == "title" and path == "":
            print(f"# {value}")
        # Print section headings
        elif field_name == "heading":
            print(f"\n\n## {value}\n")
        # Add spacing after content
        elif field_name == "content":
            print("\n")

    def on_value_chunk(self, path, field_name, chunk):
        # Stream content character by character for real-time display
        if field_name == "content":
            print(chunk, end="", flush=True)

    def on_array_item_end(self, path, field_name, item=None):
        # Print references as markdown links
        if field_name == "references":
            title = item.get("title", "")
            filename = item.get("filename", "")
            print(f"- [{title}]({filename})")

# Use the handler
handler = SearchResultHandler()
parser = StreamingJSONParser(handler)

# Simulate streaming by processing JSON in small chunks
json_message = Path('message.json').read_text(encoding='utf-8')
for i in range(0, len(json_message), 4):
    chunk = json_message[i:i+4]
    parser.parse_incremental(chunk)
    time.sleep(0.01)  # Simulate network delay

Output

The above code produces formatted markdown output that appears progressively:

# Monitoring Data Drift in Production

## Overview

Monitoring data drift is crucial to understanding the health and performance of machine learning models in production...

### References

- [Data Drift](metrics/preset_data_drift.mdx)
- [How data drift detection works](metrics/explainer_drift.mdx)
- [Overview](docs/platform/monitoring_overview.mdx)

API Reference

JSONParserHandler

Base handler class for JSON parsing events. Subclass this and override the methods you need.

Methods

on_field_start(path: str, field_name: str) -> None

Called when starting to read a field value.

  • path: Path to current location (e.g., "/sections/references")
  • field_name: Name of the field being read

on_field_end(path: str, field_name: str, value: str, parsed_value: Any = None) -> None

Called when a field value is complete.

  • path: Path to current location
  • field_name: Name of the field
  • value: Complete value of the field (as string from JSON)
  • parsed_value: Parsed value (dict for objects, list for arrays, actual value for primitives)

on_value_chunk(path: str, field_name: str, chunk: str) -> None

Called for each character as string values stream in. Perfect for displaying content in real-time.

  • path: Path to current location
  • field_name: Name of the field being streamed
  • chunk: Single character chunk

on_array_item_start(path: str, field_name: str) -> None

Called when starting a new object in an array.

  • path: Path to current location
  • field_name: Name of the array field

on_array_item_end(path: str, field_name: str, item: Dict[str, Any] = None) -> None

Called when finishing an object in an array.

  • path: Path to current location
  • field_name: Name of the array field
  • item: The complete parsed dictionary for this array item

StreamingJSONParser

Parse JSON incrementally as it streams in, character by character.

Methods

__init__(handler: JSONParserHandler = None)

Initialize the parser with a handler for events.

  • handler: JSONParserHandler instance to receive parsing events

parse_incremental(delta: str) -> None

Parse new characters added since last call. Fires callbacks as events are detected.

  • delta: New characters to parse (string)

parse_from_old_new(old_text: str, new_text: str) -> None

Convenience method that calculates the delta between old and new text.

  • old_text: Previously processed text
  • new_text: New text (should start with old_text)

Use Cases

1. Real-time LLM Response Display

Display streaming responses from Large Language Models as they're generated:

class LLMDisplayHandler(JSONParserHandler):
    def on_value_chunk(self, path, field_name, chunk):
        if field_name == "content":
            print(chunk, end="", flush=True)

parser = StreamingJSONParser(LLMDisplayHandler())
# Feed chunks as they arrive from the LLM API

2. Progress Tracking

Track progress through large JSON structures:

class ProgressHandler(JSONParserHandler):
    def __init__(self):
        self.items_processed = 0
    
    def on_array_item_end(self, path, field_name, item=None):
        self.items_processed += 1
        print(f"Processed {self.items_processed} items...")

3. Selective Field Extraction

Extract only the fields you need without parsing the entire document:

class FieldExtractor(JSONParserHandler):
    def __init__(self):
        self.titles = []
    
    def on_field_end(self, path, field_name, value, parsed_value=None):
        if field_name == "title":
            self.titles.append(value)

License

WTFPL - Do What The Fuck You Want To Public License

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jaxn-0.0.4.tar.gz (72.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jaxn-0.0.4-py3-none-any.whl (10.1 kB view details)

Uploaded Python 3

File details

Details for the file jaxn-0.0.4.tar.gz.

File metadata

  • Download URL: jaxn-0.0.4.tar.gz
  • Upload date:
  • Size: 72.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.28.1

File hashes

Hashes for jaxn-0.0.4.tar.gz
Algorithm Hash digest
SHA256 08774b4cb1af694449bd605161bb56c6ee50806819a81faca71e3cc7e7b71bcb
MD5 7c0da12f6e570c7338941cdf7e3150a9
BLAKE2b-256 4bc3ba5d49a4a0f16d4e265b94cd87540e9d925e397a538feeee7960395cf1e3

See more details on using hashes here.

File details

Details for the file jaxn-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: jaxn-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 10.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.28.1

File hashes

Hashes for jaxn-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 906a6ae690d5bfceb9390799a022e80dea680bd9dc5ff5d64a173b011daa892d
MD5 2351ed4e43be2925fe19c96260eaa8fa
BLAKE2b-256 080a2bce0552f5f13c5ec3be0928564c9b81a415ace8d3bea2f547de1dc8d7cd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page