Skip to main content

JTON (JSON Tabular Object Notation) — A Tabular JSON-Superset Encoding for Efficient LLM Data Processing

Project description

JTON

JTON (JSON Tabular Object Notation) — A high-performance, token-efficient JSON superset built in Rust with PyO3 bindings for Python. Home of Zen Grid, a token-aware tabular encoding that reduces LLM token costs by 15–60%.

Tests Performance SIMD License arXiv Playground


Try the JTON Playground → — interactive encoder/decoder with live token counts and speed benchmarks.

Overview

JTON is a JSON superset designed for LLM applications and high-throughput data processing:

  • Zen Grid: Tabular encoding reduces token count by 15–60% on benchmarked data vs JSON compact (23% average across 6 datasets, ~60% on highly-tabular Twitter-style rows)
  • LLM-Validated: 10 models tested for comprehension, 12 models tested for generation -- all achieve 100% generation validity (100% few-shot, 100% zero-shot)
  • SIMD Acceleration: AVX2 (32-byte) and AVX-512 (64-byte) structural scanning
  • JSON-compatible: supports load() / loads() / dump() / dumps() for common JSON workflows — all valid JSON is valid JTON
  • Serialization: dumps() with Zen Grid table output, Pydantic v1/v2 and dataclass support
  • JSON Extensions: Unquoted keys, // and /* */ comments, Infinity/NaN special numbers
  • Strict correctness: Rejects invalid JSON numbers (-01, 1., 0.e1) that many parsers accept silently

Quickstart

Installation

# From PyPI (once published)
pip install jton

# From source (requires Rust 1.70+ — https://rustup.rs/)
git clone https://github.com/gowthamkumar-nandakishore/jton.git
cd JTON
pip install maturin
maturin develop --release

Basic Usage

import jton

# Standard JSON parsing
data = jton.loads('{"name": "Alice", "age": 30}')

# JTON extensions — unquoted keys
data = jton.loads('{name: "Alice", age: 30}')

# Comments for configuration files
config = jton.loads('''
{
    host: "localhost",   // server address
    port: 8080,         /* default port */
    timeout: 30         // seconds
}
''')

# Special numbers (Python compatibility)
data = jton.loads('{x: Infinity, y: -Infinity, z: NaN}')

# Serialize to compact JSON
jton.dumps({"name": "Alice", "age": 30})
# → '{"name":"Alice","age":30}'

# encode/decode aliases (familiar for orjson/msgspec users)
jton.encode(data)   # same as dumps()
jton.decode(text)   # same as loads()

Using JTON with existing json-based code

JTON works well with existing JSON-heavy codebases — with one important rule:

  • Parsing is the easy win: jton.load() / jton.loads() accept normal JSON unchanged
  • Serialization is JSON-compatible for common usage, but JTON's serializer has an extra default: zen_grid=True
  • That means if you simply replace import json with import jton as json, some list[dict] payloads may serialize as Zen Grid instead of strict RFC 8259 JSON

What happens if you just replace json with jton?

import jton as json

# Existing JSON input still works
obj = json.loads('{"name":"Alice","age":30}')

# Existing file APIs still work
with open("data.json") as f:
    obj = json.load(f)

For output:

import jton as json

json.dumps({"name": "Alice", "age": 30})
# -> '{"name":"Alice","age":30}'   # still normal JSON

json.dumps([
    {"id": 1, "name": "Alice"},
    {"id": 2, "name": "Bob"},
])
# -> '[2: id, name; 1, "Alice"; 2, "Bob" ]'   # JTON Zen Grid, not strict JSON

So the practical behavior is:

  • load() / loads(): safe replacement for existing JSON parsing
  • dump() / dumps() on ordinary objects: usually still emits JSON
  • dump() / dumps() on homogeneous arrays of objects: may emit Zen Grid by default

Recommended migration path

1. Parsing-only replacement

If you want a no-risk first step, replace only parsing:

import jton

data = jton.loads(existing_json_text)

This gives you faster parsing plus JTON extensions on input, without changing output format anywhere.

2. Full json-module replacement, but keep strict JSON output

If you want import jton as json, but still need normal JSON output:

import jton as json

text = json.dumps(obj, zen_grid=False)
with open("out.json", "w") as f:
    json.dump(obj, f, zen_grid=False)

This is the safest pattern for APIs, files, and systems that expect standard JSON.

3. Enable JTON only where it helps

Use strict JSON for machines, and JTON for LLM-facing payloads:

api_payload = jton.dumps(data, zen_grid=False)   # strict JSON
llm_payload = jton.dumps(data)                   # JTON / Zen Grid when eligible

JSON-in / JSON-out compatibility cheat sheet

If you want output that stays valid JSON, use:

jton.dumps(
    obj,
    zen_grid=False,
    unquoted_keys=False,
    bare_strings=False,
    implicit_null=False,
    multiline_zen=False,
)

Notes:

  • Only zen_grid=False is required for strict JSON output in normal use
  • row_count and delimiter matter only when zen_grid=True
  • indent=2 or indent=4 still gives valid pretty JSON
  • default= works like json.dumps(default=...)

When output stops being strict JSON

These options move you out of plain JSON output:

  • zen_grid=True on homogeneous list[dict] values
  • unquoted_keys=True
  • multiline_zen=True
  • bare_strings=True inside Zen Grid cells
  • implicit_null=True inside Zen Grid cells

So the short rule is:

  • Use jton.loads() everywhere
  • Use jton.dumps(..., zen_grid=False) anywhere strict JSON is required
  • Use default jton.dumps() for LLM/token-optimized payloads

Current compatibility scope

JTON supports the common Python JSON workflow:

  • load
  • loads
  • dump
  • dumps
  • default=...
  • file objects

It is not a byte-for-byte clone of every stdlib json keyword argument yet. Think of it as:

  • fully compatible parser for JSON input
  • mostly compatible serializer for common usage
  • plus opt-in JTON/Zen Grid output when you want it

Zen Grid: Token-Efficient Table Format

When you pass a list of dicts to dumps(), JTON automatically detects the tabular structure and encodes it as a Zen Grid — one header row followed by semicolon-delimited data rows, all inline.

The format

[N: col1, col2, col3; val1, val2, val3; val4, val5, val6 ]
 ↑         ↑                  ↑
row count  headers           one record per semicolon segment
  • N — total row count (helps LLMs understand the data size at a glance)
  • First segment after [N: = comma-separated field names
  • Each subsequent segment = one record, values in the same order

Example

import jton

users = [
    {"id": 1, "name": "Alice", "score": 95},
    {"id": 2, "name": "Bob",   "score": 87},
    {"id": 3, "name": "Carol", "score": 92},
]

# Standard JSON compact — 116 chars, ~32 tokens:
# [{"id":1,"name":"Alice","score":95},{"id":2,"name":"Bob","score":87},{"id":3,"name":"Carol","score":92}]

# JTON Zen Grid — 72 chars, ~22 tokens (31% fewer tokens):
print(jton.dumps(users))
# → '[3: id, name, score; 1, "Alice", 95; 2, "Bob", 87; 3, "Carol", 92 ]'

# Disable Zen Grid for standard JSON output
print(jton.dumps(users, zen_grid=False))
# → '[{"id":1,"name":"Alice","score":95},...]'

Round-trip correctness

Zen Grid is valid JTON — jton.loads() parses it back to the original data:

original = [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]
encoded  = jton.dumps(original)           # → '[2: id, name; 1, "Alice"; 2, "Bob" ]'
decoded  = jton.loads(encoded)            # → [{"id": 1, "name": "Alice"}, ...]
assert decoded == original                # ✅ perfect round-trip

Token count analysis

import jton

data = [{"id": i, "name": f"User{i}", "score": i*5} for i in range(100)]
counts = jton.token_count(data)  # requires: pip install tiktoken
# {
#   'json_compact': {'tokens': 2843, 'savings_vs_compact': '+0.0%'},
#   'zen_grid':     {'tokens': 1820, 'savings_vs_compact': '-36.0%'},
# }

LLM integration

Add a one-line format hint to your system prompt before sending Zen Grid data:

import jton

system_prompt = jton.format_hint() + "\n\n" + jton.dumps(my_data)
Data is in JTON Zen Grid format.
Format: [N: col1, col2, col3; row1val1, row1val2, row1val3; ... ]
N = total row count. First semicolon-segment = headers.
Each subsequent segment = one record in header order.
Example: [3: id, name, score; 1, Alice, 95; 2, Bob, 87; 3, Carol, 92 ]
original = [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]
encoded = jton.dumps(original)                # → '[2: id, name; 1, "Alice"; 2, "Bob" ]'
decoded = jton.loads(encoded)                 # → [{"id": 1, "name": "Alice"}, ...]
assert decoded == original                    # ✅ perfect round-trip

Pydantic and Dataclass Support

from pydantic import BaseModel
from dataclasses import dataclass
import jton

# Pydantic v2 (model_dump)
class User(BaseModel):
    id: int
    name: str
    email: str

users = [User(id=1, name="Alice", email="a@example.com"),
         User(id=2, name="Bob",   email="b@example.com")]

print(jton.dumps(users))
# → '[2: id, name, email; 1, "Alice", "a@example.com"; 2, "Bob", "b@example.com" ]'

# Python dataclasses
@dataclass
class Point:
    x: float
    y: float

print(jton.dumps(Point(x=1.5, y=2.5)))
# → '{"x":1.5,"y":2.5}'

# Parse directly into a Pydantic model
user_data = jton.loads('{"id":1,"name":"Alice","email":"a@ex.com"}')
user = User(**user_data)
# → User(id=1, name='Alice', email='a@ex.com')

# Parse into a dataclass
pt_data = jton.loads('{"x":1.5,"y":2.5}')
pt = Point(**pt_data)
# → Point(x=1.5, y=2.5)

API Reference

JTON provides the core load, dump, loads, and dumps APIs used in common JSON workflows. The main behavioral difference is that dumps() defaults to zen_grid=True, which may emit JTON Zen Grid for homogeneous arrays of objects.

jton.loads(data, schema=None)

Parse JTON or JSON data into Python objects.

jton.loads('{"a": 1}')          # → {"a": 1}
jton.loads(b'{"a": 1}')         # bytes input OK
jton.loads('{a: 1}')            # unquoted keys OK
jton.loads('// comment\n{a:1}') # comments OK

jton.load(fp)

Parse JTON/JSON from a file object — compatible with normal json.load() usage.

with open("data.json") as f:
    data = jton.load(f)

jton.dumps(data, *, zen_grid=True, ..., default=None)

Serialize Python objects to JTON/JSON string — compatible with common json.dumps() usage.

Parameter Type Default Description
data Any required Python object to serialize
zen_grid bool True Auto-convert lists of dicts to Zen Grid table format
unquoted_keys bool False Write dict keys without quotes
indent int | None None Pretty-print with given indent width
bare_strings bool False Write identifier string values without quotes in cells
implicit_null bool False Write null cells as empty (saves ~1 token per cell)
row_count bool True Prefix Zen Grid header with [N: ...] row count
delimiter str "comma" "comma" (readable), "tab" (max savings), "pipe"
default callable | None None For non-serializable objects, same as json.dumps(default=...)

If you want strict JSON output, set zen_grid=False.

# Standard usage
jton.dumps({"a": 1}, zen_grid=False)         # → '{"a":1}'

# Custom types with default=
from datetime import date
jton.dumps({"d": date(2025,1,1)}, default=str)
# → '{"d":"2025-01-01"}'

# Works with Zen Grid too
jton.dumps([{"id":1,"d":date(2025,1,1)}], default=str)
# → '[1: id, d; 1, "2025-01-01" ]'

Supported types natively: dict, list, tuple, str, int, float, bool, None, Pydantic BaseModel (v1+v2), @dataclass

jton.dump(obj, fp, **kwargs)

Serialize to a file object — compatible with common json.dump() usage.

with open("out.jton", "w") as f:
    jton.dump(data, f)

jton.format_hint(style="zen_grid")

Return a format description for pasting into LLM system prompts.

style Description
"zen_grid" Default inline format (mentions both [: and [N: forms)
"zen_grid_rowcount" Inline with explicit [N] row count
"multiline" Multi-line format
"tab" Tab-delimited

jton.token_count(data, tokenizer="o200k_base")

Compare token costs across all output modes. Requires pip install tiktoken.

Returns a dict mapping mode names to {"tokens": int, "chars": int, "savings_vs_compact": str}.

jton.encode / jton.decode

Aliases for dumps / loads — familiar for users of orjson or msgspec.


CLI Tool

JTON ships a command-line tool for JSON ↔ Zen Grid conversion:

# After pip install or maturin develop
JTON input.json                    # encode JSON → Zen Grid (stdout)
JTON input.json -o output.JTON     # encode to file
JTON input.JTON -o output.json     # decode Zen Grid → JSON (auto-detected)
JTON input.json --stats            # show token savings
echo '{"x":1}' | JTON             # pipe stdin
JTON input.json --tab              # tab-delimited Zen Grid
JTON input.json --no-zen-grid      # plain compact JSON
JTON input.json --indent 2         # pretty-print JSON
JTON --hint                        # print LLM system-prompt template
JTON --version                     # show version

Or run directly without installation:

python -m jton.cli input.json --stats

Playground

Run the interactive playground locally to explore all JTON features:

# From the repo root (after maturin develop)
python playground/server.py

# Opens at http://127.0.0.1:7700
# Optional: pip install tiktoken  (enables live token count bars)

The playground provides:

  • Live JSON → JTON conversion with all encoding options as toggles
  • Token comparison bars (JSON pretty / JSON compact / JTON current)
  • Char savings % vs JSON compact
  • Round-trip indicator — shows if decode(encode(x)) == x
  • Format hint copier — paste into LLM system prompts
  • Sample datasets — employees, orders, analytics, deep config, GitHub repos
  • Decode mode — paste JTON output, get back pretty JSON

Performance

Speed Comparison (real-world files: canada.json 2.25 MB, citm_catalog.json 1.78 MB, twitter.json 0.65 MB)

Library loads dumps (JSON mode) Notes
stdlib json 63–184 MB/s 46–268 MB/s Pure Python/C
JTON 132–346 MB/s 197–276 MB/s Rust/SIMD, JSON mode
JTON Zen Grid 81–240 MB/s Rust, table output
orjson 235–458 MB/s 440–533 MB/s Rust, JSON only
  • JTON loads is 1.5–2.1× faster than stdlib (json.loads)
  • JTON dumps JSON mode is 1.0–4.3× faster than stdlib
  • JTON Zen Grid dumps saves 14–60% tokens (depending on data shape) while maintaining competitive throughput
  • orjson is faster on raw JSON; JTON's advantage is Zen Grid token reduction which orjson cannot provide

Large-file static benchmark: akbe_doc_classifier.json (338.1 MB)

Measured on this machine using the repository's akbe_doc_classifier.json payload in JSON-compatible mode (zen_grid=False for JTON dump):

Operation stdlib json JTON Result
Parse / decode 1.75 s (193.5 MB/s) 2.43 s (138.9 MB/s) stdlib faster on this file
Dump / encode 1.78 s (57.3 MB/s) 0.81 s (126.5 MB/s) JTON 2.2× faster

Notes:

  • This file is a large, object-heavy classifier payload rather than a tabular Zen Grid sweet spot
  • On this benchmark, JTON wins strongly on dump/encode
  • Stdlib json wins on parse/decode for this specific file shape
  • Output benchmarking used JSON-compatible serialization (jton.dumps(..., zen_grid=False))

SIMD Acceleration

JTON uses a two-pass SIMD parsing strategy modeled after simdjson:

  1. Structural scan (AVX2/AVX-512): Build index of {}[],:;" positions in a single pass
  2. Index-jumping parse: Navigate the pre-built index without byte-by-byte scanning
Feature Details
AVX2 32-byte chunks, 2013+ Intel/AMD CPUs
AVX-512 64-byte chunks, 2017+ Intel CPUs
Runtime detection Automatically selects best available ISA
Float parsing lexical-core (same algorithm as orjson)
Int serialization itoa crate (fastest known)
Float serialization ryu crate (shortest round-trip, same as orjson)
String cache Thread-local UnsafeCell — zero mutex overhead under Python GIL

Token Efficiency

JTON vs Competing Formats

Benchmarked on 6 real-world datasets using tiktoken o200k_base encoder:

Format Total Tokens vs JSON compact JSON-Compatible
JTON Zen Grid 144,159 −20.2% ✅ Yes (JTON superset)
TOON 146,113 −19.2% ❌ No (new syntax)
JSON compact 180,725 -- ✅ Yes

JTON is the most token-efficient JSON-compatible format -- TOON requires a custom parser.

Real-world LLM Token Savings

Dataset JSON compact JTON Zen Grid Savings
👥 2,000 employees (7 cols) 97,407 77,226 −20.7%
📈 365 days analytics 14,240 10,604 −25.5%
⭐ 100 GitHub repos 11,729 9,626 −17.9%
🛒 500 orders (nested) 46,381 39,565 −14.7%
🧾 300 event logs (semi-uniform) 10,745 6,915 −35.6%

LLM Comprehension Evaluation

We evaluated whether LLMs can correctly interpret Zen Grid data across 10 models from six providers, using 7 real-world datasets × 5 question types × 2 formats (700 total API calls). Uses JTON 1.0 [N: ...] syntax with bare_strings=True.

Per-Model Results

Model Family JSON Zen Grid Delta n
GPT-5.1-codex OpenAI 74.3% 71.4% −2.9 pp 35
GPT-5.1 OpenAI 71.4% 62.9% −8.6 pp 35
GPT-5-mini OpenAI 71.4% 71.4% 0.0 pp 35
Gemini 3 Pro Preview Google 68.6% 68.6% 0.0 pp 35
Kimi K2 Moonshot 62.9% 68.6% +5.7 pp 35
Qwen3 32B Alibaba 60.0% 57.1% −2.9 pp 35
Llama 3.3 70B Meta 54.3% 54.3% 0.0 pp 35
Llama 3.1 8B Meta 45.7% 48.6% +2.9 pp 35
GPT-OSS 120B Open-src 42.9% 45.7% +2.9 pp 35
Llama 4 Scout 17B Meta 40.0% 45.7% +5.7 pp 35
Overall 59.1% 59.4% +0.3 pp 350

By Question Type

Question Type JSON Zen Grid Delta
Lookup 95.7% 95.7% 0.0 pp
Filtering 52.9% 51.4% −1.4 pp
Count 51.4% 48.6% −2.9 pp
Aggregation 47.1% 51.4% +4.3 pp
Comparison 48.6% 50.0% +1.4 pp

Key Findings

Four of ten models improve with Zen Grid (Kimi K2 +5.7pp, Llama 4 Scout +5.7pp, Llama 3.1 8B +2.9pp, GPT-OSS 120B +2.9pp), three are neutral (GPT-5-mini, Gemini 3 Pro, Llama 3.3 70B), and three regress (GPT-5.1 −8.6pp being the worst). Overall, Zen Grid is +0.3 pp ahead of JSON for ~20% fewer tokens — a clear win on cost-per-correct-answer. Lookup tasks (95.7%) are perfectly preserved across formats.

LLM Generation Results

Can LLMs produce valid Zen Grid output? We tested 12 models from 6 providers with few-shot and zero-shot prompting on the JTON 1.0 [N: ...] syntax:

Model Few-shot Valid Zero-shot Valid
GPT-5-mini 100% 100%
GPT-5.1 100% 100%
GPT-4o 100% 100%
Claude Sonnet 4 100% 100%
Claude 3.5 Haiku 100% 100%
Claude 3 Haiku 100% 100%
Gemini 2.5 Flash 100% 100%
Gemini 2.5 Pro 100% 100%
Gemini 3 Flash Preview 100% 100%
Llama 3.3 70B 100% 100%
Llama 4 Scout 17B 100% 100%
Kimi K2 100% 100%
Overall 100% 100%

All 12 models achieve 100% validity in both modes. Zen Grid works for bidirectional LLM pipelines -- both input and output.

Format Comparison

Token counts on real-world data (o200k_base tokenizer):

Format Twitter GitHub Financial Avg Savings vs JSON
JSON Compact 3,673 968 643 baseline
CSV 1,303 688 408 −43.3% (no types)
Markdown 1,430 792 505 −33.6% (no types)
YAML 1,916 1,185 840 +1.7%
Zen Grid 1,653 968 516 −24.9% (full types)

Zen Grid is the only JSON-compatible format that achieves significant token savings while preserving JSON's full type system.


Features

✅ Implemented

  • Full JSON Compatibility — parse any valid RFC 8259 JSON
  • Zen Grid Tables[: header; row1; row2 ] with auto-detection and round-trip
  • Unquoted Keys{name: "value"} instead of {"name": "value"}
  • Comments// single-line and /* */ block comments
  • Special NumbersInfinity, -Infinity, NaN
  • dumps() Serializer — compact JSON + Zen Grid output
  • Pydantic SupportBaseModel (v1 dict() + v2 model_dump()) serialization
  • Python Dataclasses@dataclass instances via dataclasses.asdict()
  • encode / decode Aliases — drop-in for orjson/msgspec users
  • SIMD Scanner — AVX2 + AVX-512 structural character indexing
  • Strict Number Parsing — rejects -01, 1., 0.e1, -.5, 1+2
  • Enhanced Errors — 40-character context window with ^ markers
  • Schema-guided Parsing — optional schema parameter for 2–3× speedup on homogeneous data
  • Type Stubs__init__.pyi + py.typed for IDE/mypy support

🚧 Planned

  • Parallel Parsing — multi-core processing for very large files
  • loads(type=Model) — automatic Pydantic model deserialization

Examples

LLM Prompt Optimization

import jton

# Large tabular dataset to send to an LLM
employees = [
    {"id": 1, "name": "Alice", "dept": "Engineering", "salary": 95000, "years": 3},
    {"id": 2, "name": "Bob",   "dept": "Marketing",   "salary": 72000, "years": 5},
    # ... thousands more rows
]

# Standard JSON: every key repeated per row → high token cost
json_str = json.dumps(employees)   # "id", "name", "dept" repeated 1000× each

# JTON Zen Grid: headers written once
JTON_str = jton.dumps(employees)
# → '[: id, name, dept, salary, years; 1, "Alice", "Engineering", 95000, 3; ... ]'

# Up to 50% fewer tokens for large tabular datasets

Configuration Files

config = jton.loads('''
{
    // Server settings
    host: "0.0.0.0",
    port: 8080,

    // Database configuration
    database: {
        host: "db.example.com",
        port: 5432,
        name: "production"
    },

    workers: 4,    // CPU cores
    timeout: 30    // seconds
}
''')

API Response Processing

# JTON parses both standard JSON and JTON extensions
response = jton.loads('{"status": "ok", "users": [{id: 1, name: "Alice"}]}')

# Serialize back with token savings
payload = jton.dumps(response)

Testing

# All tests
pytest tests/ -v

# JSON spec compliance
pytest tests/test_json_compatibility.py -v

# Zen Grid round-trip tests
pytest tests/test_zen_grid.py -v

# Reference vector suite (JSONTestSuite corpus)
pytest tests/test_reference_vectors.py -v

Test results: 622 passed, 54 skipped, 58 xfailed, 0 failed

Test Coverage

Suite Tests Coverage
test_json_compatibility.py 39 JSON primitives, nesting, escapes, errors
test_reference_vectors.py 644+ parametrized JSONTestSuite corpus (valid/invalid JSON)
test_zen_grid.py 45 Zen Grid serialization, parsing, round-trips, Pydantic, dataclass
Other suites ~94 Token reduction, SIMD, number parsing

Benchmark References

JTON performance benchmarks use the same standardized test vectors as the wider JSON ecosystem.

Compliance Testing

  • JSONTestSuite (Nicolas Seriot) — 400+ JSON conformance tests for parsers; used by orjson, simdjson, and JTON
  • RFC 8259 — The IETF JSON specification (December 2017)
  • JSON_checker — Classic pass/fail fixtures (fail01–fail33)

Performance Benchmark Files

The canonical benchmark corpus from nativejson-benchmark (Milo Yip), used by orjson, simdjson, yyjson, and JTON:

File Size Dataset Characteristics
canada.json 2.15 MB GeoJSON coordinates Number-heavy (float arrays)
twitter.json 0.60 MB Twitter API timeline Unicode strings, nested objects
citm_catalog.json 1.65 MB Cinema IT Management catalog Mixed content, real-world API
large.json 7.88 MB Custom: 100K rows tabular JTON primary benchmark

Competing Libraries Referenced

Speed-Focused JSON Libraries

Library GitHub Speed Notes
orjson GitHub 586 MB/s dumps Rust-based; JSON only
ujson GitHub ~300 MB/s C-based
pysimdjson GitHub 1–2 GB/s parse Python bindings for simdjson
simdjson GitHub 2–3 GB/s C++; architecture inspiration for JTON
yapic.json GitHub ~2–3× stdlib Python/C extension

Token Efficiency Formats

Format GitHub Token Savings Approach JSON-Compatible
JTON Zen Grid This repo 11–50% Column headers once ✅ Yes
TOON GitHub ~19% Table-oriented ❌ No

Development

Build from Source

# Install prerequisites
pip install maturin

# Debug build (fast compilation)
maturin develop

# Release build (optimized, recommended for benchmarking)
maturin develop --release

Windows + Python 3.13: Set $env:PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1 before building.

Project Structure

src/
├── jton/                        # Python package (pip install jton)
│   ├── __init__.py              # Public API: loads, dumps, encode, decode, token_count
│   ├── __init__.pyi             # Type stubs (mypy/pyright)
│   ├── cli.py                   # jton CLI entry point
│   └── py.typed                 # PEP 561 marker
└── jton_core/                   # Rust implementation
    └── src/
        ├── lib.rs               # PyO3 module: loads(), dumps(), format_hint()
        ├── serializer.rs        # Zen Grid + JSON serializer, AVX-512 escape path
        ├── types/               # StructuralIndex, FieldDescriptor
        ├── simd/                # AVX2 / AVX-512 structural scanners
        └── parser/              # SIMD indexed parser, string_cache, number parsing

tests/
├── test_json_compatibility.py   # JSON spec conformance
├── test_zen_grid.py             # Zen Grid encoding/decoding + CLI
└── test_reference_vectors.py    # JSONTestSuite corpus (600+ vectors)

benchmarks/
├── run_all_benchmarks.py        # Token efficiency benchmark (8 formats × 6 datasets)
└── results/token_efficiency.md  # Latest benchmark results

Language Support

JTON officially supports Python only.

The SIMD-accelerated parser (AVX2/AVX-512 structural scanning, VPSHUFB nibble classifier, thread-local string cache) is a PyO3 native extension. The performance advantage is inseparable from the Python binding. The format spec is in SPEC.md for anyone who wants to implement JTON in another language.

Language Status Install
Python 3.11+ ✅ Official pip install jton
All others Implement from SPEC.md

Requirements

Requirement Version
Python 3.11+
Rust 1.70+
CPU AVX2 (2013+ Intel/AMD)
CPU (optional) AVX-512 for 2× SIMD throughput

Safety

  • Depth guard: MAX_NESTING_DEPTH = 100 — prevents stack overflow from deeply nested input
  • Arity tolerance: Extra table columns are silently dropped; missing columns are filled with null
  • Memory safety: All unsafe Rust code is in clearly marked blocks using PyO3 FFI patterns
  • No allocation on GIL drop: JTON never releases the GIL mid-parse, avoiding data races

CI / Publishing

Three GitHub Actions workflows are included:

Workflow File Trigger Purpose
CI .github/workflows/ci.yml Push / PR Build + test on Linux, Windows, macOS × Python 3.10–3.13
Release .github/workflows/release.yml git push --tags v* Build manylinux/macOS/Windows wheels, publish to PyPI via OIDC, draft GitHub Release
Security Audit .github/workflows/audit.yml Weekly (Mon 08:00 UTC) cargo audit against RustSec advisory database

Publishing to PyPI

The release workflow uses PyPI Trusted Publishing (OIDC) — no API token needed:

  1. Go to pypi.org/manage/account/publishing
  2. Add a new publisher:
    • Owner: your GitHub username
    • Repository: JTON
    • Workflow: release.yml
    • Environment: pypi
  3. Push a version tag to trigger the release:
    git tag v0.2.0
    git push --tags
    
  4. The workflow builds wheels for all platforms, publishes to PyPI, and creates a draft GitHub Release with wheel files attached.

License

MIT — see NOTICE for full text.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jton-1.0.1.tar.gz (3.1 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

jton-1.0.1-cp313-cp313-win_amd64.whl (399.9 kB view details)

Uploaded CPython 3.13Windows x86-64

jton-1.0.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (516.6 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

jton-1.0.1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (512.3 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ ARM64

jton-1.0.1-cp313-cp313-macosx_11_0_arm64.whl (486.4 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

jton-1.0.1-cp312-cp312-win_amd64.whl (399.9 kB view details)

Uploaded CPython 3.12Windows x86-64

jton-1.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (516.6 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

jton-1.0.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (512.3 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ ARM64

jton-1.0.1-cp312-cp312-macosx_11_0_arm64.whl (486.4 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

jton-1.0.1-cp311-cp311-win_amd64.whl (400.2 kB view details)

Uploaded CPython 3.11Windows x86-64

jton-1.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (517.7 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

jton-1.0.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (512.2 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ ARM64

jton-1.0.1-cp311-cp311-macosx_11_0_arm64.whl (487.4 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

jton-1.0.1-cp310-cp310-win_amd64.whl (400.2 kB view details)

Uploaded CPython 3.10Windows x86-64

jton-1.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (517.7 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

jton-1.0.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (512.2 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ ARM64

jton-1.0.1-cp310-cp310-macosx_11_0_arm64.whl (486.8 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

File details

Details for the file jton-1.0.1.tar.gz.

File metadata

  • Download URL: jton-1.0.1.tar.gz
  • Upload date:
  • Size: 3.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for jton-1.0.1.tar.gz
Algorithm Hash digest
SHA256 aad1c00e9bff708475777450e0dc0434a346471c3eb67bd3affe1460340908c0
MD5 faea087f4da3b0c07aebba87fff92d63
BLAKE2b-256 807a909185c6658be2e49f0efc32eacb3c200c3e2fcba5552a1493eff83c9b6b

See more details on using hashes here.

File details

Details for the file jton-1.0.1-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: jton-1.0.1-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 399.9 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for jton-1.0.1-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 a49ab34e24efd7e9fd3bef78f6f0cac692adca23a4e0a3e3c6ab876ee7ec7126
MD5 c4c03d782cd71a64b0084862e00dca43
BLAKE2b-256 7ed7388a418a4fc2d61d723dcf9514d65dda9c8751c38914f8b9096332d05a79

See more details on using hashes here.

File details

Details for the file jton-1.0.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for jton-1.0.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 fb1a86810e5bdfe2a975b1326678f1b90969ffd3b44caff6309eabd5f092604f
MD5 f8a1d204b941e6f9b4935556aa6e6eb8
BLAKE2b-256 d2dc205bf95d9621edfa7c3da32ced74c10da2956f579c0109d43be28a49113f

See more details on using hashes here.

File details

Details for the file jton-1.0.1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for jton-1.0.1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 047b5b74d9dfeb065ddadd0ed687e786f9d9afa2c32ed88787df73fbeec2a97b
MD5 92d0ef6f63366595316ab626514bc97d
BLAKE2b-256 d683c04463c9cae84d9fe1b87e33a32fd2a9a93713db388d8b983a4f36033675

See more details on using hashes here.

File details

Details for the file jton-1.0.1-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

  • Download URL: jton-1.0.1-cp313-cp313-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 486.4 kB
  • Tags: CPython 3.13, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for jton-1.0.1-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 85f71f97f47eec4f2179896b3c8e3d55d4915a821a700682b357d14ef8c8816e
MD5 a14834c0cbcd50fbbb0e008f3e8698ed
BLAKE2b-256 9d46386af1c53263b4c25d2c81d44d5a3f6652efb5295793d83f5e5090e5f102

See more details on using hashes here.

File details

Details for the file jton-1.0.1-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: jton-1.0.1-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 399.9 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for jton-1.0.1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 962376cf7b915f0ac9170b3a270b71d1e5ac1134c5f1023f7464974afe2b3325
MD5 a23b8431120b603750e65be21315ebea
BLAKE2b-256 d1df39bb16c11c331681961f7b65586265c2057959ec313bf7bd99da22837569

See more details on using hashes here.

File details

Details for the file jton-1.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for jton-1.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3e9716a45727042a9124567516966414dbb2fecfb8d1c0fb939279598a8a8b15
MD5 bec8f01f0319f25acb435034dad7363a
BLAKE2b-256 5a204fadf6d03de39dbb54bce4cccb7732ba4d2c9bb97dc9620b63e22ad292d1

See more details on using hashes here.

File details

Details for the file jton-1.0.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for jton-1.0.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 5209de32ef7f30f1c490d1a1db826c0cb15cfdc47dbdccbcf385bfda71a98a40
MD5 ae2727fce52b6c5b82619afd5bcf358a
BLAKE2b-256 2fa7bd160902b958d5948d929bdb9a1bc479f6bdc2e5bd8c78af6fc467a43119

See more details on using hashes here.

File details

Details for the file jton-1.0.1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

  • Download URL: jton-1.0.1-cp312-cp312-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 486.4 kB
  • Tags: CPython 3.12, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for jton-1.0.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 614a55f4a16df847e3d37315f6e7075c9761d7d4709372f2fb7bc8e2cc2909e0
MD5 cf09f344cc1d4ad2905e9713569c76b9
BLAKE2b-256 b6f91b8d558c80db3dd0ffb65eab8715f4c0b7e2fec2ae597ec4ea8b444e53f0

See more details on using hashes here.

File details

Details for the file jton-1.0.1-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: jton-1.0.1-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 400.2 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for jton-1.0.1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 cf331d006a81991cdd61798ad832a12340de83f0d16b61a9aadbf858a1e342b2
MD5 eb5f5ed00a9bbd3d88eec6908b2ef9dc
BLAKE2b-256 b81a93177d510b9e8c14c0adecd23b82f7f070edea50cdde520368126dee599e

See more details on using hashes here.

File details

Details for the file jton-1.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for jton-1.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 203b26263cdf5991c47d01a5c6a111d7751e996a05dea85fd40847ed5be76b4c
MD5 35deb6e9d9a8b637cc5d37064d7f63fb
BLAKE2b-256 c7e39a37f272551bd5b9ec12bd33432ebdc61da463a9b1d6c5f1e3789b6b9d56

See more details on using hashes here.

File details

Details for the file jton-1.0.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for jton-1.0.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 7af236ba711687fb3bdc531563c18cc589c709025466bf6cbc28d9c0e973cba9
MD5 b9e3dc20dbd1bd69ea236bcf282c977d
BLAKE2b-256 90fcfadcc724a6171b35405d3288734676e5351682b0887b03556d720f88b098

See more details on using hashes here.

File details

Details for the file jton-1.0.1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

  • Download URL: jton-1.0.1-cp311-cp311-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 487.4 kB
  • Tags: CPython 3.11, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for jton-1.0.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7636885a3f6ba1538f02659b29335dfcd9b49dcf9fecbd8c1111c3fe4a1524b4
MD5 23d7229def8771acaefebc1c6125d5b6
BLAKE2b-256 e1cd5bc9689b3aaea72d85ebc8f21f29647d39fc50cb3e45164b4491858172d5

See more details on using hashes here.

File details

Details for the file jton-1.0.1-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: jton-1.0.1-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 400.2 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for jton-1.0.1-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 21f545855d855bf4b59ff91c3913bd50d2c5bbe352ce7a06b3ef0dc04d492ade
MD5 e0f4b51a74f97c46af7c96775d561527
BLAKE2b-256 218b2a6214b16923ee715bb6e04e47a4c5a53c75c6934bef01091202e5f5c10b

See more details on using hashes here.

File details

Details for the file jton-1.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for jton-1.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6857e0f3b4867c69e863a6b6b23ac24408055069451bbba52751d8497ac73c09
MD5 b3cf05be0cb3622fa5e55bca7b3fd8d9
BLAKE2b-256 bcf492dc58c7037f1dbb7390aed2bfeb6fc0119977f04a4c75d68aadce07b835

See more details on using hashes here.

File details

Details for the file jton-1.0.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for jton-1.0.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 a4d40a88c1e21c4a31527616724bb39e75b90f291fb79c88690cfd6cc45fc581
MD5 1f69dc170fc0d96da108ea3ce6eea97c
BLAKE2b-256 94246a5120685a48670eec23c49038d0b80afd41c29812accaf6d03d72857069

See more details on using hashes here.

File details

Details for the file jton-1.0.1-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

  • Download URL: jton-1.0.1-cp310-cp310-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 486.8 kB
  • Tags: CPython 3.10, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for jton-1.0.1-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2ff6749dff4b9f3f5bbcfc8d79cd8abe42dcf828420778c1a9c9b44f4b502cb4
MD5 2b8ee882dbe91785145793736d0842c2
BLAKE2b-256 c9c8955adff372c87c225cb5a789f215ddcb5f4c0ed16f76c6217cff2f985100

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page