JTON (JSON Tabular Object Notation) — A Tabular JSON-Superset Encoding for Efficient LLM Data Processing
Project description
JTON
JTON (JSON Tabular Object Notation) — A high-performance, token-efficient JSON superset built in Rust with PyO3 bindings for Python. Home of Zen Grid, a token-aware tabular encoding that reduces LLM token costs by 15–60%.
Try the JTON Playground → — interactive encoder/decoder with live token counts and speed benchmarks.
Overview
JTON is a JSON superset designed for LLM applications and high-throughput data processing:
- Zen Grid: Tabular encoding reduces token count by 15–60% on benchmarked data vs JSON compact (23% average across 6 datasets, ~60% on highly-tabular Twitter-style rows)
- LLM-Validated: 10 models tested for comprehension, 12 models tested for generation -- all achieve 100% generation validity (100% few-shot, 100% zero-shot)
- SIMD Acceleration: AVX2 (32-byte) and AVX-512 (64-byte) structural scanning
- JSON-compatible: supports
load()/loads()/dump()/dumps()for common JSON workflows — all valid JSON is valid JTON - Serialization:
dumps()with Zen Grid table output, Pydantic v1/v2 and dataclass support - JSON Extensions: Unquoted keys,
//and/* */comments,Infinity/NaNspecial numbers - Strict correctness: Rejects invalid JSON numbers (
-01,1.,0.e1) that many parsers accept silently
Quickstart
Installation
# From PyPI (once published)
pip install jton
# From source (requires Rust 1.70+ — https://rustup.rs/)
git clone https://github.com/gowthamkumar-nandakishore/jton.git
cd JTON
pip install maturin
maturin develop --release
Basic Usage
import jton
# Standard JSON parsing
data = jton.loads('{"name": "Alice", "age": 30}')
# JTON extensions — unquoted keys
data = jton.loads('{name: "Alice", age: 30}')
# Comments for configuration files
config = jton.loads('''
{
host: "localhost", // server address
port: 8080, /* default port */
timeout: 30 // seconds
}
''')
# Special numbers (Python compatibility)
data = jton.loads('{x: Infinity, y: -Infinity, z: NaN}')
# Serialize to compact JSON
jton.dumps({"name": "Alice", "age": 30})
# → '{"name":"Alice","age":30}'
# encode/decode aliases (familiar for orjson/msgspec users)
jton.encode(data) # same as dumps()
jton.decode(text) # same as loads()
Using JTON with existing json-based code
JTON works well with existing JSON-heavy codebases — with one important rule:
- Parsing is the easy win:
jton.load()/jton.loads()accept normal JSON unchanged - Serialization is JSON-compatible for common usage, but JTON's serializer has an extra default:
zen_grid=True - That means if you simply replace
import jsonwithimport jton as json, somelist[dict]payloads may serialize as Zen Grid instead of strict RFC 8259 JSON
What happens if you just replace json with jton?
import jton as json
# Existing JSON input still works
obj = json.loads('{"name":"Alice","age":30}')
# Existing file APIs still work
with open("data.json") as f:
obj = json.load(f)
For output:
import jton as json
json.dumps({"name": "Alice", "age": 30})
# -> '{"name":"Alice","age":30}' # still normal JSON
json.dumps([
{"id": 1, "name": "Alice"},
{"id": 2, "name": "Bob"},
])
# -> '[2: id, name; 1, "Alice"; 2, "Bob" ]' # JTON Zen Grid, not strict JSON
So the practical behavior is:
load()/loads(): safe replacement for existing JSON parsingdump()/dumps()on ordinary objects: usually still emits JSONdump()/dumps()on homogeneous arrays of objects: may emit Zen Grid by default
Recommended migration path
1. Parsing-only replacement
If you want a no-risk first step, replace only parsing:
import jton
data = jton.loads(existing_json_text)
This gives you faster parsing plus JTON extensions on input, without changing output format anywhere.
2. Full json-module replacement, but keep strict JSON output
If you want import jton as json, but still need normal JSON output:
import jton as json
text = json.dumps(obj, zen_grid=False)
with open("out.json", "w") as f:
json.dump(obj, f, zen_grid=False)
This is the safest pattern for APIs, files, and systems that expect standard JSON.
3. Enable JTON only where it helps
Use strict JSON for machines, and JTON for LLM-facing payloads:
api_payload = jton.dumps(data, zen_grid=False) # strict JSON
llm_payload = jton.dumps(data) # JTON / Zen Grid when eligible
JSON-in / JSON-out compatibility cheat sheet
If you want output that stays valid JSON, use:
jton.dumps(
obj,
zen_grid=False,
unquoted_keys=False,
bare_strings=False,
implicit_null=False,
multiline_zen=False,
)
Notes:
- Only
zen_grid=Falseis required for strict JSON output in normal use row_countanddelimitermatter only whenzen_grid=Trueindent=2orindent=4still gives valid pretty JSONdefault=works likejson.dumps(default=...)
When output stops being strict JSON
These options move you out of plain JSON output:
zen_grid=Trueon homogeneouslist[dict]valuesunquoted_keys=Truemultiline_zen=Truebare_strings=Trueinside Zen Grid cellsimplicit_null=Trueinside Zen Grid cells
So the short rule is:
- Use
jton.loads()everywhere - Use
jton.dumps(..., zen_grid=False)anywhere strict JSON is required - Use default
jton.dumps()for LLM/token-optimized payloads
Current compatibility scope
JTON supports the common Python JSON workflow:
loadloadsdumpdumpsdefault=...- file objects
It is not a byte-for-byte clone of every stdlib json keyword argument yet. Think of it as:
- fully compatible parser for JSON input
- mostly compatible serializer for common usage
- plus opt-in JTON/Zen Grid output when you want it
Zen Grid: Token-Efficient Table Format
When you pass a list of dicts to dumps(), JTON automatically detects the tabular structure and encodes it as a Zen Grid — one header row followed by semicolon-delimited data rows, all inline.
The format
[N: col1, col2, col3; val1, val2, val3; val4, val5, val6 ]
↑ ↑ ↑
row count headers one record per semicolon segment
N— total row count (helps LLMs understand the data size at a glance)- First segment after
[N:= comma-separated field names - Each subsequent segment = one record, values in the same order
Example
import jton
users = [
{"id": 1, "name": "Alice", "score": 95},
{"id": 2, "name": "Bob", "score": 87},
{"id": 3, "name": "Carol", "score": 92},
]
# Standard JSON compact — 116 chars, ~32 tokens:
# [{"id":1,"name":"Alice","score":95},{"id":2,"name":"Bob","score":87},{"id":3,"name":"Carol","score":92}]
# JTON Zen Grid — 72 chars, ~22 tokens (31% fewer tokens):
print(jton.dumps(users))
# → '[3: id, name, score; 1, "Alice", 95; 2, "Bob", 87; 3, "Carol", 92 ]'
# Disable Zen Grid for standard JSON output
print(jton.dumps(users, zen_grid=False))
# → '[{"id":1,"name":"Alice","score":95},...]'
Round-trip correctness
Zen Grid is valid JTON — jton.loads() parses it back to the original data:
original = [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]
encoded = jton.dumps(original) # → '[2: id, name; 1, "Alice"; 2, "Bob" ]'
decoded = jton.loads(encoded) # → [{"id": 1, "name": "Alice"}, ...]
assert decoded == original # ✅ perfect round-trip
Token count analysis
import jton
data = [{"id": i, "name": f"User{i}", "score": i*5} for i in range(100)]
counts = jton.token_count(data) # requires: pip install tiktoken
# {
# 'json_compact': {'tokens': 2843, 'savings_vs_compact': '+0.0%'},
# 'zen_grid': {'tokens': 1820, 'savings_vs_compact': '-36.0%'},
# }
LLM integration
Add a one-line format hint to your system prompt before sending Zen Grid data:
import jton
system_prompt = jton.format_hint() + "\n\n" + jton.dumps(my_data)
Data is in JTON Zen Grid format.
Format: [N: col1, col2, col3; row1val1, row1val2, row1val3; ... ]
N = total row count. First semicolon-segment = headers.
Each subsequent segment = one record in header order.
Example: [3: id, name, score; 1, Alice, 95; 2, Bob, 87; 3, Carol, 92 ]
original = [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]
encoded = jton.dumps(original) # → '[2: id, name; 1, "Alice"; 2, "Bob" ]'
decoded = jton.loads(encoded) # → [{"id": 1, "name": "Alice"}, ...]
assert decoded == original # ✅ perfect round-trip
Pydantic and Dataclass Support
from pydantic import BaseModel
from dataclasses import dataclass
import jton
# Pydantic v2 (model_dump)
class User(BaseModel):
id: int
name: str
email: str
users = [User(id=1, name="Alice", email="a@example.com"),
User(id=2, name="Bob", email="b@example.com")]
print(jton.dumps(users))
# → '[2: id, name, email; 1, "Alice", "a@example.com"; 2, "Bob", "b@example.com" ]'
# Python dataclasses
@dataclass
class Point:
x: float
y: float
print(jton.dumps(Point(x=1.5, y=2.5)))
# → '{"x":1.5,"y":2.5}'
# Parse directly into a Pydantic model
user_data = jton.loads('{"id":1,"name":"Alice","email":"a@ex.com"}')
user = User(**user_data)
# → User(id=1, name='Alice', email='a@ex.com')
# Parse into a dataclass
pt_data = jton.loads('{"x":1.5,"y":2.5}')
pt = Point(**pt_data)
# → Point(x=1.5, y=2.5)
API Reference
JTON provides the core load, dump, loads, and dumps APIs used in common JSON workflows. The main behavioral difference is that dumps() defaults to zen_grid=True, which may emit JTON Zen Grid for homogeneous arrays of objects.
jton.loads(data, schema=None)
Parse JTON or JSON data into Python objects.
jton.loads('{"a": 1}') # → {"a": 1}
jton.loads(b'{"a": 1}') # bytes input OK
jton.loads('{a: 1}') # unquoted keys OK
jton.loads('// comment\n{a:1}') # comments OK
jton.load(fp)
Parse JTON/JSON from a file object — compatible with normal json.load() usage.
with open("data.json") as f:
data = jton.load(f)
jton.dumps(data, *, zen_grid=True, ..., default=None)
Serialize Python objects to JTON/JSON string — compatible with common json.dumps() usage.
| Parameter | Type | Default | Description |
|---|---|---|---|
data |
Any |
required | Python object to serialize |
zen_grid |
bool |
True |
Auto-convert lists of dicts to Zen Grid table format |
unquoted_keys |
bool |
False |
Write dict keys without quotes |
indent |
int | None |
None |
Pretty-print with given indent width |
bare_strings |
bool |
False |
Write identifier string values without quotes in cells |
implicit_null |
bool |
False |
Write null cells as empty (saves ~1 token per cell) |
row_count |
bool |
True |
Prefix Zen Grid header with [N: ...] row count |
delimiter |
str |
"comma" |
"comma" (readable), "tab" (max savings), "pipe" |
default |
callable | None |
None |
For non-serializable objects, same as json.dumps(default=...) |
If you want strict JSON output, set zen_grid=False.
# Standard usage
jton.dumps({"a": 1}, zen_grid=False) # → '{"a":1}'
# Custom types with default=
from datetime import date
jton.dumps({"d": date(2025,1,1)}, default=str)
# → '{"d":"2025-01-01"}'
# Works with Zen Grid too
jton.dumps([{"id":1,"d":date(2025,1,1)}], default=str)
# → '[1: id, d; 1, "2025-01-01" ]'
Supported types natively: dict, list, tuple, str, int, float, bool, None, Pydantic BaseModel (v1+v2), @dataclass
jton.dump(obj, fp, **kwargs)
Serialize to a file object — compatible with common json.dump() usage.
with open("out.jton", "w") as f:
jton.dump(data, f)
jton.format_hint(style="zen_grid")
Return a format description for pasting into LLM system prompts.
style |
Description |
|---|---|
"zen_grid" |
Default inline format (mentions both [: and [N: forms) |
"zen_grid_rowcount" |
Inline with explicit [N] row count |
"multiline" |
Multi-line format |
"tab" |
Tab-delimited |
jton.token_count(data, tokenizer="o200k_base")
Compare token costs across all output modes. Requires pip install tiktoken.
Returns a dict mapping mode names to {"tokens": int, "chars": int, "savings_vs_compact": str}.
jton.encode / jton.decode
Aliases for dumps / loads — familiar for users of orjson or msgspec.
CLI Tool
JTON ships a command-line tool for JSON ↔ Zen Grid conversion:
# After pip install or maturin develop
JTON input.json # encode JSON → Zen Grid (stdout)
JTON input.json -o output.JTON # encode to file
JTON input.JTON -o output.json # decode Zen Grid → JSON (auto-detected)
JTON input.json --stats # show token savings
echo '{"x":1}' | JTON # pipe stdin
JTON input.json --tab # tab-delimited Zen Grid
JTON input.json --no-zen-grid # plain compact JSON
JTON input.json --indent 2 # pretty-print JSON
JTON --hint # print LLM system-prompt template
JTON --version # show version
Or run directly without installation:
python -m jton.cli input.json --stats
Playground
Run the interactive playground locally to explore all JTON features:
# From the repo root (after maturin develop)
python playground/server.py
# Opens at http://127.0.0.1:7700
# Optional: pip install tiktoken (enables live token count bars)
The playground provides:
- Live JSON → JTON conversion with all encoding options as toggles
- Token comparison bars (JSON pretty / JSON compact / JTON current)
- Char savings % vs JSON compact
- Round-trip indicator — shows if decode(encode(x)) == x
- Format hint copier — paste into LLM system prompts
- Sample datasets — employees, orders, analytics, deep config, GitHub repos
- Decode mode — paste JTON output, get back pretty JSON
Performance
Speed Comparison (real-world files: canada.json 2.25 MB, citm_catalog.json 1.78 MB, twitter.json 0.65 MB)
| Library | loads |
dumps (JSON mode) |
Notes |
|---|---|---|---|
stdlib json |
63–184 MB/s | 46–268 MB/s | Pure Python/C |
| JTON | 132–346 MB/s | 197–276 MB/s | Rust/SIMD, JSON mode |
| JTON Zen Grid | — | 81–240 MB/s | Rust, table output |
| orjson | 235–458 MB/s | 440–533 MB/s | Rust, JSON only |
- JTON
loadsis 1.5–2.1× faster than stdlib (json.loads) - JTON
dumpsJSON mode is 1.0–4.3× faster than stdlib - JTON Zen Grid
dumpssaves 14–60% tokens (depending on data shape) while maintaining competitive throughput - orjson is faster on raw JSON; JTON's advantage is Zen Grid token reduction which orjson cannot provide
Large-file static benchmark: akbe_doc_classifier.json (338.1 MB)
Measured on this machine using the repository's akbe_doc_classifier.json payload in JSON-compatible mode (zen_grid=False for JTON dump):
| Operation | stdlib json |
JTON | Result |
|---|---|---|---|
| Parse / decode | 1.75 s (193.5 MB/s) | 2.43 s (138.9 MB/s) | stdlib faster on this file |
| Dump / encode | 1.78 s (57.3 MB/s) | 0.81 s (126.5 MB/s) | JTON 2.2× faster |
Notes:
- This file is a large, object-heavy classifier payload rather than a tabular Zen Grid sweet spot
- On this benchmark, JTON wins strongly on dump/encode
- Stdlib
jsonwins on parse/decode for this specific file shape - Output benchmarking used JSON-compatible serialization (
jton.dumps(..., zen_grid=False))
SIMD Acceleration
JTON uses a two-pass SIMD parsing strategy modeled after simdjson:
- Structural scan (AVX2/AVX-512): Build index of
{}[],:;"positions in a single pass - Index-jumping parse: Navigate the pre-built index without byte-by-byte scanning
| Feature | Details |
|---|---|
| AVX2 | 32-byte chunks, 2013+ Intel/AMD CPUs |
| AVX-512 | 64-byte chunks, 2017+ Intel CPUs |
| Runtime detection | Automatically selects best available ISA |
| Float parsing | lexical-core (same algorithm as orjson) |
| Int serialization | itoa crate (fastest known) |
| Float serialization | ryu crate (shortest round-trip, same as orjson) |
| String cache | Thread-local UnsafeCell — zero mutex overhead under Python GIL |
Token Efficiency
JTON vs Competing Formats
Benchmarked on 6 real-world datasets using tiktoken o200k_base encoder:
| Format | Total Tokens | vs JSON compact | JSON-Compatible |
|---|---|---|---|
| JTON Zen Grid | 144,159 | −20.2% | ✅ Yes (JTON superset) |
| TOON | 146,113 | −19.2% | ❌ No (new syntax) |
| JSON compact | 180,725 | -- | ✅ Yes |
JTON is the most token-efficient JSON-compatible format -- TOON requires a custom parser.
Real-world LLM Token Savings
| Dataset | JSON compact | JTON Zen Grid | Savings |
|---|---|---|---|
| 👥 2,000 employees (7 cols) | 97,407 | 77,226 | −20.7% |
| 📈 365 days analytics | 14,240 | 10,604 | −25.5% |
| ⭐ 100 GitHub repos | 11,729 | 9,626 | −17.9% |
| 🛒 500 orders (nested) | 46,381 | 39,565 | −14.7% |
| 🧾 300 event logs (semi-uniform) | 10,745 | 6,915 | −35.6% |
LLM Comprehension Evaluation
We evaluated whether LLMs can correctly interpret Zen Grid data across 10 models from six providers, using 7 real-world datasets × 5 question types × 2 formats (700 total API calls). Uses JTON 1.0 [N: ...] syntax with bare_strings=True.
Per-Model Results
| Model | Family | JSON | Zen Grid | Delta | n |
|---|---|---|---|---|---|
| GPT-5.1-codex | OpenAI | 74.3% | 71.4% | −2.9 pp | 35 |
| GPT-5.1 | OpenAI | 71.4% | 62.9% | −8.6 pp | 35 |
| GPT-5-mini | OpenAI | 71.4% | 71.4% | 0.0 pp | 35 |
| Gemini 3 Pro Preview | 68.6% | 68.6% | 0.0 pp | 35 | |
| Kimi K2 | Moonshot | 62.9% | 68.6% | +5.7 pp | 35 |
| Qwen3 32B | Alibaba | 60.0% | 57.1% | −2.9 pp | 35 |
| Llama 3.3 70B | Meta | 54.3% | 54.3% | 0.0 pp | 35 |
| Llama 3.1 8B | Meta | 45.7% | 48.6% | +2.9 pp | 35 |
| GPT-OSS 120B | Open-src | 42.9% | 45.7% | +2.9 pp | 35 |
| Llama 4 Scout 17B | Meta | 40.0% | 45.7% | +5.7 pp | 35 |
| Overall | 59.1% | 59.4% | +0.3 pp | 350 |
By Question Type
| Question Type | JSON | Zen Grid | Delta |
|---|---|---|---|
| Lookup | 95.7% | 95.7% | 0.0 pp |
| Filtering | 52.9% | 51.4% | −1.4 pp |
| Count | 51.4% | 48.6% | −2.9 pp |
| Aggregation | 47.1% | 51.4% | +4.3 pp |
| Comparison | 48.6% | 50.0% | +1.4 pp |
Key Findings
Four of ten models improve with Zen Grid (Kimi K2 +5.7pp, Llama 4 Scout +5.7pp, Llama 3.1 8B +2.9pp, GPT-OSS 120B +2.9pp), three are neutral (GPT-5-mini, Gemini 3 Pro, Llama 3.3 70B), and three regress (GPT-5.1 −8.6pp being the worst). Overall, Zen Grid is +0.3 pp ahead of JSON for ~20% fewer tokens — a clear win on cost-per-correct-answer. Lookup tasks (95.7%) are perfectly preserved across formats.
LLM Generation Results
Can LLMs produce valid Zen Grid output? We tested 12 models from 6 providers with few-shot and zero-shot prompting on the JTON 1.0 [N: ...] syntax:
| Model | Few-shot Valid | Zero-shot Valid |
|---|---|---|
| GPT-5-mini | 100% | 100% |
| GPT-5.1 | 100% | 100% |
| GPT-4o | 100% | 100% |
| Claude Sonnet 4 | 100% | 100% |
| Claude 3.5 Haiku | 100% | 100% |
| Claude 3 Haiku | 100% | 100% |
| Gemini 2.5 Flash | 100% | 100% |
| Gemini 2.5 Pro | 100% | 100% |
| Gemini 3 Flash Preview | 100% | 100% |
| Llama 3.3 70B | 100% | 100% |
| Llama 4 Scout 17B | 100% | 100% |
| Kimi K2 | 100% | 100% |
| Overall | 100% | 100% |
All 12 models achieve 100% validity in both modes. Zen Grid works for bidirectional LLM pipelines -- both input and output.
Format Comparison
Token counts on real-world data (o200k_base tokenizer):
| Format | GitHub | Financial | Avg Savings vs JSON | |
|---|---|---|---|---|
| JSON Compact | 3,673 | 968 | 643 | baseline |
| CSV | 1,303 | 688 | 408 | −43.3% (no types) |
| Markdown | 1,430 | 792 | 505 | −33.6% (no types) |
| YAML | 1,916 | 1,185 | 840 | +1.7% |
| Zen Grid | 1,653 | 968 | 516 | −24.9% (full types) |
Zen Grid is the only JSON-compatible format that achieves significant token savings while preserving JSON's full type system.
Features
✅ Implemented
- Full JSON Compatibility — parse any valid RFC 8259 JSON
- Zen Grid Tables —
[: header; row1; row2 ]with auto-detection and round-trip - Unquoted Keys —
{name: "value"}instead of{"name": "value"} - Comments —
//single-line and/* */block comments - Special Numbers —
Infinity,-Infinity,NaN dumps()Serializer — compact JSON + Zen Grid output- Pydantic Support —
BaseModel(v1dict()+ v2model_dump()) serialization - Python Dataclasses —
@dataclassinstances viadataclasses.asdict() encode/decodeAliases — drop-in for orjson/msgspec users- SIMD Scanner — AVX2 + AVX-512 structural character indexing
- Strict Number Parsing — rejects
-01,1.,0.e1,-.5,1+2 - Enhanced Errors — 40-character context window with
^markers - Schema-guided Parsing — optional
schemaparameter for 2–3× speedup on homogeneous data - Type Stubs —
__init__.pyi+py.typedfor IDE/mypy support
🚧 Planned
- Parallel Parsing — multi-core processing for very large files
loads(type=Model)— automatic Pydantic model deserialization
Examples
LLM Prompt Optimization
import jton
# Large tabular dataset to send to an LLM
employees = [
{"id": 1, "name": "Alice", "dept": "Engineering", "salary": 95000, "years": 3},
{"id": 2, "name": "Bob", "dept": "Marketing", "salary": 72000, "years": 5},
# ... thousands more rows
]
# Standard JSON: every key repeated per row → high token cost
json_str = json.dumps(employees) # "id", "name", "dept" repeated 1000× each
# JTON Zen Grid: headers written once
JTON_str = jton.dumps(employees)
# → '[: id, name, dept, salary, years; 1, "Alice", "Engineering", 95000, 3; ... ]'
# Up to 50% fewer tokens for large tabular datasets
Configuration Files
config = jton.loads('''
{
// Server settings
host: "0.0.0.0",
port: 8080,
// Database configuration
database: {
host: "db.example.com",
port: 5432,
name: "production"
},
workers: 4, // CPU cores
timeout: 30 // seconds
}
''')
API Response Processing
# JTON parses both standard JSON and JTON extensions
response = jton.loads('{"status": "ok", "users": [{id: 1, name: "Alice"}]}')
# Serialize back with token savings
payload = jton.dumps(response)
Testing
# All tests
pytest tests/ -v
# JSON spec compliance
pytest tests/test_json_compatibility.py -v
# Zen Grid round-trip tests
pytest tests/test_zen_grid.py -v
# Reference vector suite (JSONTestSuite corpus)
pytest tests/test_reference_vectors.py -v
Test results: 622 passed, 54 skipped, 58 xfailed, 0 failed
Test Coverage
| Suite | Tests | Coverage |
|---|---|---|
test_json_compatibility.py |
39 | JSON primitives, nesting, escapes, errors |
test_reference_vectors.py |
644+ parametrized | JSONTestSuite corpus (valid/invalid JSON) |
test_zen_grid.py |
45 | Zen Grid serialization, parsing, round-trips, Pydantic, dataclass |
| Other suites | ~94 | Token reduction, SIMD, number parsing |
Benchmark References
JTON performance benchmarks use the same standardized test vectors as the wider JSON ecosystem.
Compliance Testing
- JSONTestSuite (Nicolas Seriot) — 400+ JSON conformance tests for parsers; used by orjson, simdjson, and JTON
- RFC 8259 — The IETF JSON specification (December 2017)
- JSON_checker — Classic pass/fail fixtures (fail01–fail33)
Performance Benchmark Files
The canonical benchmark corpus from nativejson-benchmark (Milo Yip), used by orjson, simdjson, yyjson, and JTON:
| File | Size | Dataset | Characteristics |
|---|---|---|---|
canada.json |
2.15 MB | GeoJSON coordinates | Number-heavy (float arrays) |
twitter.json |
0.60 MB | Twitter API timeline | Unicode strings, nested objects |
citm_catalog.json |
1.65 MB | Cinema IT Management catalog | Mixed content, real-world API |
large.json |
7.88 MB | Custom: 100K rows tabular | JTON primary benchmark |
Competing Libraries Referenced
Speed-Focused JSON Libraries
| Library | GitHub | Speed | Notes |
|---|---|---|---|
| orjson | 586 MB/s dumps | Rust-based; JSON only | |
| ujson | ~300 MB/s | C-based | |
| pysimdjson | 1–2 GB/s parse | Python bindings for simdjson | |
| simdjson | 2–3 GB/s | C++; architecture inspiration for JTON | |
| yapic.json | ~2–3× stdlib | Python/C extension |
Token Efficiency Formats
| Format | GitHub | Token Savings | Approach | JSON-Compatible |
|---|---|---|---|---|
| JTON Zen Grid | This repo | 11–50% | Column headers once | ✅ Yes |
| TOON | ~19% | Table-oriented | ❌ No |
Development
Build from Source
# Install prerequisites
pip install maturin
# Debug build (fast compilation)
maturin develop
# Release build (optimized, recommended for benchmarking)
maturin develop --release
Windows + Python 3.13: Set
$env:PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1before building.
Project Structure
src/
├── jton/ # Python package (pip install jton)
│ ├── __init__.py # Public API: loads, dumps, encode, decode, token_count
│ ├── __init__.pyi # Type stubs (mypy/pyright)
│ ├── cli.py # jton CLI entry point
│ └── py.typed # PEP 561 marker
└── jton_core/ # Rust implementation
└── src/
├── lib.rs # PyO3 module: loads(), dumps(), format_hint()
├── serializer.rs # Zen Grid + JSON serializer, AVX-512 escape path
├── types/ # StructuralIndex, FieldDescriptor
├── simd/ # AVX2 / AVX-512 structural scanners
└── parser/ # SIMD indexed parser, string_cache, number parsing
tests/
├── test_json_compatibility.py # JSON spec conformance
├── test_zen_grid.py # Zen Grid encoding/decoding + CLI
└── test_reference_vectors.py # JSONTestSuite corpus (600+ vectors)
benchmarks/
├── run_all_benchmarks.py # Token efficiency benchmark (8 formats × 6 datasets)
└── results/token_efficiency.md # Latest benchmark results
Language Support
JTON officially supports Python only.
The SIMD-accelerated parser (AVX2/AVX-512 structural scanning, VPSHUFB nibble classifier, thread-local string cache) is a PyO3 native extension. The performance advantage is inseparable from the Python binding. The format spec is in SPEC.md for anyone who wants to implement JTON in another language.
| Language | Status | Install |
|---|---|---|
| Python 3.11+ | ✅ Official | pip install jton |
| All others | — | Implement from SPEC.md |
Requirements
| Requirement | Version |
|---|---|
| Python | 3.11+ |
| Rust | 1.70+ |
| CPU | AVX2 (2013+ Intel/AMD) |
| CPU (optional) | AVX-512 for 2× SIMD throughput |
Safety
- Depth guard:
MAX_NESTING_DEPTH = 100— prevents stack overflow from deeply nested input - Arity tolerance: Extra table columns are silently dropped; missing columns are filled with
null - Memory safety: All unsafe Rust code is in clearly marked blocks using PyO3 FFI patterns
- No allocation on GIL drop: JTON never releases the GIL mid-parse, avoiding data races
CI / Publishing
Three GitHub Actions workflows are included:
| Workflow | File | Trigger | Purpose |
|---|---|---|---|
| CI | .github/workflows/ci.yml |
Push / PR | Build + test on Linux, Windows, macOS × Python 3.10–3.13 |
| Release | .github/workflows/release.yml |
git push --tags v* |
Build manylinux/macOS/Windows wheels, publish to PyPI via OIDC, draft GitHub Release |
| Security Audit | .github/workflows/audit.yml |
Weekly (Mon 08:00 UTC) | cargo audit against RustSec advisory database |
Publishing to PyPI
The release workflow uses PyPI Trusted Publishing (OIDC) — no API token needed:
- Go to pypi.org/manage/account/publishing
- Add a new publisher:
- Owner: your GitHub username
- Repository:
JTON - Workflow:
release.yml - Environment:
pypi
- Push a version tag to trigger the release:
git tag v0.2.0 git push --tags
- The workflow builds wheels for all platforms, publishes to PyPI, and creates a draft GitHub Release with wheel files attached.
License
MIT — see NOTICE for full text.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file jton-1.0.1.tar.gz.
File metadata
- Download URL: jton-1.0.1.tar.gz
- Upload date:
- Size: 3.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aad1c00e9bff708475777450e0dc0434a346471c3eb67bd3affe1460340908c0
|
|
| MD5 |
faea087f4da3b0c07aebba87fff92d63
|
|
| BLAKE2b-256 |
807a909185c6658be2e49f0efc32eacb3c200c3e2fcba5552a1493eff83c9b6b
|
File details
Details for the file jton-1.0.1-cp313-cp313-win_amd64.whl.
File metadata
- Download URL: jton-1.0.1-cp313-cp313-win_amd64.whl
- Upload date:
- Size: 399.9 kB
- Tags: CPython 3.13, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a49ab34e24efd7e9fd3bef78f6f0cac692adca23a4e0a3e3c6ab876ee7ec7126
|
|
| MD5 |
c4c03d782cd71a64b0084862e00dca43
|
|
| BLAKE2b-256 |
7ed7388a418a4fc2d61d723dcf9514d65dda9c8751c38914f8b9096332d05a79
|
File details
Details for the file jton-1.0.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: jton-1.0.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 516.6 kB
- Tags: CPython 3.13, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fb1a86810e5bdfe2a975b1326678f1b90969ffd3b44caff6309eabd5f092604f
|
|
| MD5 |
f8a1d204b941e6f9b4935556aa6e6eb8
|
|
| BLAKE2b-256 |
d2dc205bf95d9621edfa7c3da32ced74c10da2956f579c0109d43be28a49113f
|
File details
Details for the file jton-1.0.1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: jton-1.0.1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 512.3 kB
- Tags: CPython 3.13, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
047b5b74d9dfeb065ddadd0ed687e786f9d9afa2c32ed88787df73fbeec2a97b
|
|
| MD5 |
92d0ef6f63366595316ab626514bc97d
|
|
| BLAKE2b-256 |
d683c04463c9cae84d9fe1b87e33a32fd2a9a93713db388d8b983a4f36033675
|
File details
Details for the file jton-1.0.1-cp313-cp313-macosx_11_0_arm64.whl.
File metadata
- Download URL: jton-1.0.1-cp313-cp313-macosx_11_0_arm64.whl
- Upload date:
- Size: 486.4 kB
- Tags: CPython 3.13, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
85f71f97f47eec4f2179896b3c8e3d55d4915a821a700682b357d14ef8c8816e
|
|
| MD5 |
a14834c0cbcd50fbbb0e008f3e8698ed
|
|
| BLAKE2b-256 |
9d46386af1c53263b4c25d2c81d44d5a3f6652efb5295793d83f5e5090e5f102
|
File details
Details for the file jton-1.0.1-cp312-cp312-win_amd64.whl.
File metadata
- Download URL: jton-1.0.1-cp312-cp312-win_amd64.whl
- Upload date:
- Size: 399.9 kB
- Tags: CPython 3.12, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
962376cf7b915f0ac9170b3a270b71d1e5ac1134c5f1023f7464974afe2b3325
|
|
| MD5 |
a23b8431120b603750e65be21315ebea
|
|
| BLAKE2b-256 |
d1df39bb16c11c331681961f7b65586265c2057959ec313bf7bd99da22837569
|
File details
Details for the file jton-1.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: jton-1.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 516.6 kB
- Tags: CPython 3.12, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3e9716a45727042a9124567516966414dbb2fecfb8d1c0fb939279598a8a8b15
|
|
| MD5 |
bec8f01f0319f25acb435034dad7363a
|
|
| BLAKE2b-256 |
5a204fadf6d03de39dbb54bce4cccb7732ba4d2c9bb97dc9620b63e22ad292d1
|
File details
Details for the file jton-1.0.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: jton-1.0.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 512.3 kB
- Tags: CPython 3.12, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5209de32ef7f30f1c490d1a1db826c0cb15cfdc47dbdccbcf385bfda71a98a40
|
|
| MD5 |
ae2727fce52b6c5b82619afd5bcf358a
|
|
| BLAKE2b-256 |
2fa7bd160902b958d5948d929bdb9a1bc479f6bdc2e5bd8c78af6fc467a43119
|
File details
Details for the file jton-1.0.1-cp312-cp312-macosx_11_0_arm64.whl.
File metadata
- Download URL: jton-1.0.1-cp312-cp312-macosx_11_0_arm64.whl
- Upload date:
- Size: 486.4 kB
- Tags: CPython 3.12, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
614a55f4a16df847e3d37315f6e7075c9761d7d4709372f2fb7bc8e2cc2909e0
|
|
| MD5 |
cf09f344cc1d4ad2905e9713569c76b9
|
|
| BLAKE2b-256 |
b6f91b8d558c80db3dd0ffb65eab8715f4c0b7e2fec2ae597ec4ea8b444e53f0
|
File details
Details for the file jton-1.0.1-cp311-cp311-win_amd64.whl.
File metadata
- Download URL: jton-1.0.1-cp311-cp311-win_amd64.whl
- Upload date:
- Size: 400.2 kB
- Tags: CPython 3.11, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cf331d006a81991cdd61798ad832a12340de83f0d16b61a9aadbf858a1e342b2
|
|
| MD5 |
eb5f5ed00a9bbd3d88eec6908b2ef9dc
|
|
| BLAKE2b-256 |
b81a93177d510b9e8c14c0adecd23b82f7f070edea50cdde520368126dee599e
|
File details
Details for the file jton-1.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: jton-1.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 517.7 kB
- Tags: CPython 3.11, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
203b26263cdf5991c47d01a5c6a111d7751e996a05dea85fd40847ed5be76b4c
|
|
| MD5 |
35deb6e9d9a8b637cc5d37064d7f63fb
|
|
| BLAKE2b-256 |
c7e39a37f272551bd5b9ec12bd33432ebdc61da463a9b1d6c5f1e3789b6b9d56
|
File details
Details for the file jton-1.0.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: jton-1.0.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 512.2 kB
- Tags: CPython 3.11, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7af236ba711687fb3bdc531563c18cc589c709025466bf6cbc28d9c0e973cba9
|
|
| MD5 |
b9e3dc20dbd1bd69ea236bcf282c977d
|
|
| BLAKE2b-256 |
90fcfadcc724a6171b35405d3288734676e5351682b0887b03556d720f88b098
|
File details
Details for the file jton-1.0.1-cp311-cp311-macosx_11_0_arm64.whl.
File metadata
- Download URL: jton-1.0.1-cp311-cp311-macosx_11_0_arm64.whl
- Upload date:
- Size: 487.4 kB
- Tags: CPython 3.11, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7636885a3f6ba1538f02659b29335dfcd9b49dcf9fecbd8c1111c3fe4a1524b4
|
|
| MD5 |
23d7229def8771acaefebc1c6125d5b6
|
|
| BLAKE2b-256 |
e1cd5bc9689b3aaea72d85ebc8f21f29647d39fc50cb3e45164b4491858172d5
|
File details
Details for the file jton-1.0.1-cp310-cp310-win_amd64.whl.
File metadata
- Download URL: jton-1.0.1-cp310-cp310-win_amd64.whl
- Upload date:
- Size: 400.2 kB
- Tags: CPython 3.10, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
21f545855d855bf4b59ff91c3913bd50d2c5bbe352ce7a06b3ef0dc04d492ade
|
|
| MD5 |
e0f4b51a74f97c46af7c96775d561527
|
|
| BLAKE2b-256 |
218b2a6214b16923ee715bb6e04e47a4c5a53c75c6934bef01091202e5f5c10b
|
File details
Details for the file jton-1.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: jton-1.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 517.7 kB
- Tags: CPython 3.10, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6857e0f3b4867c69e863a6b6b23ac24408055069451bbba52751d8497ac73c09
|
|
| MD5 |
b3cf05be0cb3622fa5e55bca7b3fd8d9
|
|
| BLAKE2b-256 |
bcf492dc58c7037f1dbb7390aed2bfeb6fc0119977f04a4c75d68aadce07b835
|
File details
Details for the file jton-1.0.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: jton-1.0.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 512.2 kB
- Tags: CPython 3.10, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a4d40a88c1e21c4a31527616724bb39e75b90f291fb79c88690cfd6cc45fc581
|
|
| MD5 |
1f69dc170fc0d96da108ea3ce6eea97c
|
|
| BLAKE2b-256 |
94246a5120685a48670eec23c49038d0b80afd41c29812accaf6d03d72857069
|
File details
Details for the file jton-1.0.1-cp310-cp310-macosx_11_0_arm64.whl.
File metadata
- Download URL: jton-1.0.1-cp310-cp310-macosx_11_0_arm64.whl
- Upload date:
- Size: 486.8 kB
- Tags: CPython 3.10, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2ff6749dff4b9f3f5bbcfc8d79cd8abe42dcf828420778c1a9c9b44f4b502cb4
|
|
| MD5 |
2b8ee882dbe91785145793736d0842c2
|
|
| BLAKE2b-256 |
c9c8955adff372c87c225cb5a789f215ddcb5f4c0ed16f76c6217cff2f985100
|