Skip to main content

A SAX-style JSON parser for processing incomplete JSON streams

Project description

jaxn

A SAX-style JSON parser for processing incomplete JSON streams character-by-character.

Overview

jaxn is a lightweight streaming JSON parser that processes JSON incrementally as it arrives, similar to how SAX parsers work with XML. Instead of waiting for the complete JSON document, jaxn fires callbacks as it encounters different parts of the JSON structure, making it perfect for:

  • Real-time streaming applications (e.g., LLM responses, API streams)
  • Processing large JSON files without loading them entirely into memory
  • Displaying content as it arrives rather than waiting for complete responses
  • Building responsive UIs that update progressively

Installation

pip install jaxn

Quick Start

Here's a simple example that prints field values as they're parsed:

from jaxn import StreamingJSONParser, JSONParserHandler

class SimpleHandler(JSONParserHandler):
    def on_field_end(self, path, field_name, value, parsed_value=None):
        print(f"{field_name}: {value}")

handler = SimpleHandler()
parser = StreamingJSONParser(handler)

# Process JSON incrementally
json_data = '{"name": "Alice", "age": 30}'
parser.parse_incremental(json_data)

Detailed Example: Streaming Markdown Renderer

This example shows how to convert a streaming JSON response into formatted markdown output in real-time. The example is based on the demo in the demo/ directory.

The JSON Structure

{
    "title": "Monitoring Data Drift in Production",
    "sections": [
        {
            "heading": "Overview",
            "content": "Monitoring data drift is crucial...",
            "references": [
                {
                    "title": "Data Drift",
                    "filename": "metrics/preset_data_drift.mdx"
                }
            ]
        }
    ]
}

The Handler Implementation

from pathlib import Path
from jaxn import StreamingJSONParser, JSONParserHandler
import time

class SearchResultHandler(JSONParserHandler):
    def on_field_start(self, path: str, field_name: str):
        # Print references header when we encounter a references array
        if field_name == "references":
            level = path.count("/") + 2
            print(f"\n{'#' * level} References\n")

    def on_field_end(self, path, field_name, value, parsed_value=None):
        # Print title as main heading
        if field_name == "title" and path == "":
            print(f"# {value}")
        # Print section headings
        elif field_name == "heading":
            print(f"\n\n## {value}\n")
        # Add spacing after content
        elif field_name == "content":
            print("\n")

    def on_value_chunk(self, path, field_name, chunk):
        # Stream content character by character for real-time display
        if field_name == "content":
            print(chunk, end="", flush=True)

    def on_array_item_end(self, path, field_name, item=None):
        # Print references as markdown links
        if field_name == "references":
            title = item.get("title", "")
            filename = item.get("filename", "")
            print(f"- [{title}]({filename})")

# Use the handler
handler = SearchResultHandler()
parser = StreamingJSONParser(handler)

# Simulate streaming by processing JSON in small chunks
json_message = Path('message.json').read_text(encoding='utf-8')
for i in range(0, len(json_message), 4):
    chunk = json_message[i:i+4]
    parser.parse_incremental(chunk)
    time.sleep(0.01)  # Simulate network delay

Output

The above code produces formatted markdown output that appears progressively:

# Monitoring Data Drift in Production

## Overview

Monitoring data drift is crucial to understanding the health and performance of machine learning models in production...

### References

- [Data Drift](metrics/preset_data_drift.mdx)
- [How data drift detection works](metrics/explainer_drift.mdx)
- [Overview](docs/platform/monitoring_overview.mdx)

API Reference

JSONParserHandler

Base handler class for JSON parsing events. Subclass this and override the methods you need.

Methods

on_field_start(path: str, field_name: str) -> None

Called when starting to read a field value.

  • path: Path to current location (e.g., "/sections/references")
  • field_name: Name of the field being read

on_field_end(path: str, field_name: str, value: str, parsed_value: Any = None) -> None

Called when a field value is complete.

  • path: Path to current location
  • field_name: Name of the field
  • value: Complete value of the field (as string from JSON)
  • parsed_value: Parsed value (dict for objects, list for arrays, actual value for primitives)

on_value_chunk(path: str, field_name: str, chunk: str) -> None

Called for each character as string values stream in. Perfect for displaying content in real-time.

  • path: Path to current location
  • field_name: Name of the field being streamed
  • chunk: Single character chunk

on_array_item_start(path: str, field_name: str) -> None

Called when starting a new object in an array.

  • path: Path to current location
  • field_name: Name of the array field

on_array_item_end(path: str, field_name: str, item: Dict[str, Any] = None) -> None

Called when finishing an object in an array.

  • path: Path to current location
  • field_name: Name of the array field
  • item: The complete parsed dictionary for this array item

StreamingJSONParser

Parse JSON incrementally as it streams in, character by character.

Methods

__init__(handler: JSONParserHandler = None)

Initialize the parser with a handler for events.

  • handler: JSONParserHandler instance to receive parsing events

parse_incremental(delta: str) -> None

Parse new characters added since last call. Fires callbacks as events are detected.

  • delta: New characters to parse (string)

parse_from_old_new(old_text: str, new_text: str) -> None

Convenience method that calculates the delta between old and new text.

  • old_text: Previously processed text
  • new_text: New text (should start with old_text)

Use Cases

1. Real-time LLM Response Display

Display streaming responses from Large Language Models as they're generated:

class LLMDisplayHandler(JSONParserHandler):
    def on_value_chunk(self, path, field_name, chunk):
        if field_name == "content":
            print(chunk, end="", flush=True)

parser = StreamingJSONParser(LLMDisplayHandler())
# Feed chunks as they arrive from the LLM API

2. Progress Tracking

Track progress through large JSON structures:

class ProgressHandler(JSONParserHandler):
    def __init__(self):
        self.items_processed = 0
    
    def on_array_item_end(self, path, field_name, item=None):
        self.items_processed += 1
        print(f"Processed {self.items_processed} items...")

3. Selective Field Extraction

Extract only the fields you need without parsing the entire document:

class FieldExtractor(JSONParserHandler):
    def __init__(self):
        self.titles = []
    
    def on_field_end(self, path, field_name, value, parsed_value=None):
        if field_name == "title":
            self.titles.append(value)

License

WTFPL - Do What The Fuck You Want To Public License

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jaxn-0.0.2.tar.gz (66.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jaxn-0.0.2-py3-none-any.whl (7.7 kB view details)

Uploaded Python 3

File details

Details for the file jaxn-0.0.2.tar.gz.

File metadata

  • Download URL: jaxn-0.0.2.tar.gz
  • Upload date:
  • Size: 66.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.28.1

File hashes

Hashes for jaxn-0.0.2.tar.gz
Algorithm Hash digest
SHA256 1b2a300c04408909c7ff12b21a4818733097dfe5877abe2f0e8e3d25b5a44b3e
MD5 f15ab1c72cf5b048f8780d854a992aaa
BLAKE2b-256 30748e721220cf2161b3dccc4bf016cf611a5edcf982ec1e9151dd18dc46bfe1

See more details on using hashes here.

File details

Details for the file jaxn-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: jaxn-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 7.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.28.1

File hashes

Hashes for jaxn-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 78001f657ef5fb6b731e8db15e9a2dec418f66a2d9c1c20fb613b2286bc0b940
MD5 4b1d04dfac1934bd5e6cb1935c2a349d
BLAKE2b-256 94d3226147c702faee748dcf37b05fb8cf8823a6d0523acf169d27d615647f61

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page