Skip to main content

JTON (JSON Tabular Object Notation) — A Tabular JSON-Superset Encoding for Efficient LLM Data Processing

Project description

JTON

JTON (JSON Tabular Object Notation) — A high-performance, token-efficient JSON superset built in Rust with PyO3 bindings for Python. Home of Zen Grid, a token-aware tabular encoding that reduces LLM token costs by 15–60%.

Tests Performance SIMD License arXiv Playground


Try the JTON Playground → — interactive encoder/decoder with live token counts and speed benchmarks.

Overview

JTON is a JSON superset designed for LLM applications and high-throughput data processing:

  • Zen Grid: Tabular encoding reduces token count by 15–60% on benchmarked data vs JSON compact (23% average across 6 datasets, ~60% on highly-tabular Twitter-style rows)
  • LLM-Validated: 10 models tested for comprehension, 12 models tested for generation -- all achieve 100% generation validity (100% few-shot, 100% zero-shot)
  • SIMD Acceleration: AVX2 (32-byte) and AVX-512 (64-byte) structural scanning
  • JSON-compatible: supports load() / loads() / dump() / dumps() for common JSON workflows — all valid JSON is valid JTON
  • Serialization: dumps() with Zen Grid table output, Pydantic v1/v2 and dataclass support
  • JSON Extensions: Unquoted keys, // and /* */ comments, Infinity/NaN special numbers
  • Strict correctness: Rejects invalid JSON numbers (-01, 1., 0.e1) that many parsers accept silently

Quickstart

Installation

# From PyPI (once published)
pip install jton

# From source (requires Rust 1.70+ — https://rustup.rs/)
git clone https://github.com/gowthamkumar-nandakishore/jton.git
cd JTON
pip install maturin
maturin develop --release

Basic Usage

import jton

# Standard JSON parsing
data = jton.loads('{"name": "Alice", "age": 30}')

# JTON extensions — unquoted keys
data = jton.loads('{name: "Alice", age: 30}')

# Comments for configuration files
config = jton.loads('''
{
    host: "localhost",   // server address
    port: 8080,         /* default port */
    timeout: 30         // seconds
}
''')

# Special numbers (Python compatibility)
data = jton.loads('{x: Infinity, y: -Infinity, z: NaN}')

# Serialize to compact JSON
jton.dumps({"name": "Alice", "age": 30})
# → '{"name":"Alice","age":30}'

# encode/decode aliases (familiar for orjson/msgspec users)
jton.encode(data)   # same as dumps()
jton.decode(text)   # same as loads()

Using JTON with existing json-based code

JTON works well with existing JSON-heavy codebases — with one important rule:

  • Parsing is the easy win: jton.load() / jton.loads() accept normal JSON unchanged
  • Serialization is JSON-compatible for common usage, but JTON's serializer has an extra default: zen_grid=True
  • That means if you simply replace import json with import jton as json, some list[dict] payloads may serialize as Zen Grid instead of strict RFC 8259 JSON

What happens if you just replace json with jton?

import jton as json

# Existing JSON input still works
obj = json.loads('{"name":"Alice","age":30}')

# Existing file APIs still work
with open("data.json") as f:
    obj = json.load(f)

For output:

import jton as json

json.dumps({"name": "Alice", "age": 30})
# -> '{"name":"Alice","age":30}'   # still normal JSON

json.dumps([
    {"id": 1, "name": "Alice"},
    {"id": 2, "name": "Bob"},
])
# -> '[2: id, name; 1, "Alice"; 2, "Bob" ]'   # JTON Zen Grid, not strict JSON

So the practical behavior is:

  • load() / loads(): safe replacement for existing JSON parsing
  • dump() / dumps() on ordinary objects: usually still emits JSON
  • dump() / dumps() on homogeneous arrays of objects: may emit Zen Grid by default

Recommended migration path

1. Parsing-only replacement

If you want a no-risk first step, replace only parsing:

import jton

data = jton.loads(existing_json_text)

This gives you faster parsing plus JTON extensions on input, without changing output format anywhere.

2. Full json-module replacement, but keep strict JSON output

If you want import jton as json, but still need normal JSON output:

import jton as json

text = json.dumps(obj, zen_grid=False)
with open("out.json", "w") as f:
    json.dump(obj, f, zen_grid=False)

This is the safest pattern for APIs, files, and systems that expect standard JSON.

3. Enable JTON only where it helps

Use strict JSON for machines, and JTON for LLM-facing payloads:

api_payload = jton.dumps(data, zen_grid=False)   # strict JSON
llm_payload = jton.dumps(data)                   # JTON / Zen Grid when eligible

JSON-in / JSON-out compatibility cheat sheet

If you want output that stays valid JSON, use:

jton.dumps(
    obj,
    zen_grid=False,
    unquoted_keys=False,
    bare_strings=False,
    implicit_null=False,
    multiline_zen=False,
)

Notes:

  • Only zen_grid=False is required for strict JSON output in normal use
  • row_count and delimiter matter only when zen_grid=True
  • indent=2 or indent=4 still gives valid pretty JSON
  • default= works like json.dumps(default=...)

When output stops being strict JSON

These options move you out of plain JSON output:

  • zen_grid=True on homogeneous list[dict] values
  • unquoted_keys=True
  • multiline_zen=True
  • bare_strings=True inside Zen Grid cells
  • implicit_null=True inside Zen Grid cells

So the short rule is:

  • Use jton.loads() everywhere
  • Use jton.dumps(..., zen_grid=False) anywhere strict JSON is required
  • Use default jton.dumps() for LLM/token-optimized payloads

Current compatibility scope

JTON supports the common Python JSON workflow:

  • load
  • loads
  • dump
  • dumps
  • default=...
  • file objects

It is not a byte-for-byte clone of every stdlib json keyword argument yet. Think of it as:

  • fully compatible parser for JSON input
  • mostly compatible serializer for common usage
  • plus opt-in JTON/Zen Grid output when you want it

Zen Grid: Token-Efficient Table Format

When you pass a list of dicts to dumps(), JTON automatically detects the tabular structure and encodes it as a Zen Grid — one header row followed by semicolon-delimited data rows, all inline.

The format

[N: col1, col2, col3; val1, val2, val3; val4, val5, val6 ]
 ↑         ↑                  ↑
row count  headers           one record per semicolon segment
  • N — total row count (helps LLMs understand the data size at a glance)
  • First segment after [N: = comma-separated field names
  • Each subsequent segment = one record, values in the same order

Example

import jton

users = [
    {"id": 1, "name": "Alice", "score": 95},
    {"id": 2, "name": "Bob",   "score": 87},
    {"id": 3, "name": "Carol", "score": 92},
]

# Standard JSON compact — 116 chars, ~32 tokens:
# [{"id":1,"name":"Alice","score":95},{"id":2,"name":"Bob","score":87},{"id":3,"name":"Carol","score":92}]

# JTON Zen Grid — 72 chars, ~22 tokens (31% fewer tokens):
print(jton.dumps(users))
# → '[3: id, name, score; 1, "Alice", 95; 2, "Bob", 87; 3, "Carol", 92 ]'

# Disable Zen Grid for standard JSON output
print(jton.dumps(users, zen_grid=False))
# → '[{"id":1,"name":"Alice","score":95},...]'

Round-trip correctness

Zen Grid is valid JTON — jton.loads() parses it back to the original data:

original = [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]
encoded  = jton.dumps(original)           # → '[2: id, name; 1, "Alice"; 2, "Bob" ]'
decoded  = jton.loads(encoded)            # → [{"id": 1, "name": "Alice"}, ...]
assert decoded == original                # ✅ perfect round-trip

Token count analysis

import jton

data = [{"id": i, "name": f"User{i}", "score": i*5} for i in range(100)]
counts = jton.token_count(data)  # requires: pip install tiktoken
# {
#   'json_compact': {'tokens': 2843, 'savings_vs_compact': '+0.0%'},
#   'zen_grid':     {'tokens': 1820, 'savings_vs_compact': '-36.0%'},
# }

LLM integration

Add a one-line format hint to your system prompt before sending Zen Grid data:

import jton

system_prompt = jton.format_hint() + "\n\n" + jton.dumps(my_data)
Data is in JTON Zen Grid format.
Format: [N: col1, col2, col3; row1val1, row1val2, row1val3; ... ]
N = total row count. First semicolon-segment = headers.
Each subsequent segment = one record in header order.
Example: [3: id, name, score; 1, Alice, 95; 2, Bob, 87; 3, Carol, 92 ]
original = [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]
encoded = jton.dumps(original)                # → '[2: id, name; 1, "Alice"; 2, "Bob" ]'
decoded = jton.loads(encoded)                 # → [{"id": 1, "name": "Alice"}, ...]
assert decoded == original                    # ✅ perfect round-trip

Pydantic and Dataclass Support

from pydantic import BaseModel
from dataclasses import dataclass
import jton

# Pydantic v2 (model_dump)
class User(BaseModel):
    id: int
    name: str
    email: str

users = [User(id=1, name="Alice", email="a@example.com"),
         User(id=2, name="Bob",   email="b@example.com")]

print(jton.dumps(users))
# → '[2: id, name, email; 1, "Alice", "a@example.com"; 2, "Bob", "b@example.com" ]'

# Python dataclasses
@dataclass
class Point:
    x: float
    y: float

print(jton.dumps(Point(x=1.5, y=2.5)))
# → '{"x":1.5,"y":2.5}'

# Parse directly into a Pydantic model
user_data = jton.loads('{"id":1,"name":"Alice","email":"a@ex.com"}')
user = User(**user_data)
# → User(id=1, name='Alice', email='a@ex.com')

# Parse into a dataclass
pt_data = jton.loads('{"x":1.5,"y":2.5}')
pt = Point(**pt_data)
# → Point(x=1.5, y=2.5)

API Reference

JTON provides the core load, dump, loads, and dumps APIs used in common JSON workflows. The main behavioral difference is that dumps() defaults to zen_grid=True, which may emit JTON Zen Grid for homogeneous arrays of objects.

jton.loads(data, schema=None)

Parse JTON or JSON data into Python objects.

jton.loads('{"a": 1}')          # → {"a": 1}
jton.loads(b'{"a": 1}')         # bytes input OK
jton.loads('{a: 1}')            # unquoted keys OK
jton.loads('// comment\n{a:1}') # comments OK

jton.load(fp)

Parse JTON/JSON from a file object — compatible with normal json.load() usage.

with open("data.json") as f:
    data = jton.load(f)

jton.dumps(data, *, zen_grid=True, ..., default=None)

Serialize Python objects to JTON/JSON string — compatible with common json.dumps() usage.

Parameter Type Default Description
data Any required Python object to serialize
zen_grid bool True Auto-convert lists of dicts to Zen Grid table format
unquoted_keys bool False Write dict keys without quotes
indent int | None None Pretty-print with given indent width
bare_strings bool False Write identifier string values without quotes in cells
implicit_null bool False Write null cells as empty (saves ~1 token per cell)
row_count bool True Prefix Zen Grid header with [N: ...] row count
delimiter str "comma" "comma" (readable), "tab" (max savings), "pipe"
default callable | None None For non-serializable objects, same as json.dumps(default=...)

If you want strict JSON output, set zen_grid=False.

# Standard usage
jton.dumps({"a": 1}, zen_grid=False)         # → '{"a":1}'

# Custom types with default=
from datetime import date
jton.dumps({"d": date(2025,1,1)}, default=str)
# → '{"d":"2025-01-01"}'

# Works with Zen Grid too
jton.dumps([{"id":1,"d":date(2025,1,1)}], default=str)
# → '[1: id, d; 1, "2025-01-01" ]'

Supported types natively: dict, list, tuple, str, int, float, bool, None, Pydantic BaseModel (v1+v2), @dataclass

jton.dump(obj, fp, **kwargs)

Serialize to a file object — compatible with common json.dump() usage.

with open("out.jton", "w") as f:
    jton.dump(data, f)

jton.format_hint(style="zen_grid")

Return a format description for pasting into LLM system prompts.

style Description
"zen_grid" Default inline format (mentions both [: and [N: forms)
"zen_grid_rowcount" Inline with explicit [N] row count
"multiline" Multi-line format
"tab" Tab-delimited

jton.token_count(data, tokenizer="o200k_base")

Compare token costs across all output modes. Requires pip install tiktoken.

Returns a dict mapping mode names to {"tokens": int, "chars": int, "savings_vs_compact": str}.

jton.encode / jton.decode

Aliases for dumps / loads — familiar for users of orjson or msgspec.


CLI Tool

JTON ships a command-line tool for JSON ↔ Zen Grid conversion:

# After pip install or maturin develop
JTON input.json                    # encode JSON → Zen Grid (stdout)
JTON input.json -o output.JTON     # encode to file
JTON input.JTON -o output.json     # decode Zen Grid → JSON (auto-detected)
JTON input.json --stats            # show token savings
echo '{"x":1}' | JTON             # pipe stdin
JTON input.json --tab              # tab-delimited Zen Grid
JTON input.json --no-zen-grid      # plain compact JSON
JTON input.json --indent 2         # pretty-print JSON
JTON --hint                        # print LLM system-prompt template
JTON --version                     # show version

Or run directly without installation:

python -m jton.cli input.json --stats

Playground

Run the interactive playground locally to explore all JTON features:

# From the repo root (after maturin develop)
python playground/server.py

# Opens at http://127.0.0.1:7700
# Optional: pip install tiktoken  (enables live token count bars)

The playground provides:

  • Live JSON → JTON conversion with all encoding options as toggles
  • Token comparison bars (JSON pretty / JSON compact / JTON current)
  • Char savings % vs JSON compact
  • Round-trip indicator — shows if decode(encode(x)) == x
  • Format hint copier — paste into LLM system prompts
  • Sample datasets — employees, orders, analytics, deep config, GitHub repos
  • Decode mode — paste JTON output, get back pretty JSON

Performance

Speed Comparison (real-world files: canada.json 2.25 MB, citm_catalog.json 1.78 MB, twitter.json 0.65 MB)

Library loads dumps (JSON mode) Notes
stdlib json 63–184 MB/s 46–268 MB/s Pure Python/C
JTON 132–346 MB/s 197–276 MB/s Rust/SIMD, JSON mode
JTON Zen Grid 81–240 MB/s Rust, table output
orjson 235–458 MB/s 440–533 MB/s Rust, JSON only
  • JTON loads is 1.5–2.1× faster than stdlib (json.loads)
  • JTON dumps JSON mode is 1.0–4.3× faster than stdlib
  • JTON Zen Grid dumps saves 14–60% tokens (depending on data shape) while maintaining competitive throughput
  • orjson is faster on raw JSON; JTON's advantage is Zen Grid token reduction which orjson cannot provide

Large-file static benchmark: akbe_doc_classifier.json (338.1 MB)

Measured on this machine using the repository's akbe_doc_classifier.json payload in JSON-compatible mode (zen_grid=False for JTON dump):

Operation stdlib json JTON Result
Parse / decode 1.75 s (193.5 MB/s) 2.43 s (138.9 MB/s) stdlib faster on this file
Dump / encode 1.78 s (57.3 MB/s) 0.81 s (126.5 MB/s) JTON 2.2× faster

Notes:

  • This file is a large, object-heavy classifier payload rather than a tabular Zen Grid sweet spot
  • On this benchmark, JTON wins strongly on dump/encode
  • Stdlib json wins on parse/decode for this specific file shape
  • Output benchmarking used JSON-compatible serialization (jton.dumps(..., zen_grid=False))

SIMD Acceleration

JTON uses a two-pass SIMD parsing strategy modeled after simdjson:

  1. Structural scan (AVX2/AVX-512): Build index of {}[],:;" positions in a single pass
  2. Index-jumping parse: Navigate the pre-built index without byte-by-byte scanning
Feature Details
AVX2 32-byte chunks, 2013+ Intel/AMD CPUs
AVX-512 64-byte chunks, 2017+ Intel CPUs
Runtime detection Automatically selects best available ISA
Float parsing lexical-core (same algorithm as orjson)
Int serialization itoa crate (fastest known)
Float serialization ryu crate (shortest round-trip, same as orjson)
String cache Thread-local UnsafeCell — zero mutex overhead under Python GIL

Token Efficiency

JTON vs Competing Formats

Benchmarked on 6 real-world datasets using tiktoken o200k_base encoder:

Format Total Tokens vs JSON compact JSON-Compatible
JTON Zen Grid 144,159 −20.2% ✅ Yes (JTON superset)
TOON 146,113 −19.2% ❌ No (new syntax)
JSON compact 180,725 -- ✅ Yes

JTON is the most token-efficient JSON-compatible format -- TOON requires a custom parser.

Real-world LLM Token Savings

Dataset JSON compact JTON Zen Grid Savings
👥 2,000 employees (7 cols) 97,407 77,226 −20.7%
📈 365 days analytics 14,240 10,604 −25.5%
⭐ 100 GitHub repos 11,729 9,626 −17.9%
🛒 500 orders (nested) 46,381 39,565 −14.7%
🧾 300 event logs (semi-uniform) 10,745 6,915 −35.6%

LLM Comprehension Evaluation

We evaluated whether LLMs can correctly interpret Zen Grid data across 10 models from six providers, using 7 real-world datasets × 5 question types × 2 formats (700 total API calls). Uses JTON 1.0 [N: ...] syntax with bare_strings=True.

Per-Model Results

Model Family JSON Zen Grid Delta n
GPT-5.1-codex OpenAI 74.3% 71.4% −2.9 pp 35
GPT-5.1 OpenAI 71.4% 62.9% −8.6 pp 35
GPT-5-mini OpenAI 71.4% 71.4% 0.0 pp 35
Gemini 3 Pro Preview Google 68.6% 68.6% 0.0 pp 35
Kimi K2 Moonshot 62.9% 68.6% +5.7 pp 35
Qwen3 32B Alibaba 60.0% 57.1% −2.9 pp 35
Llama 3.3 70B Meta 54.3% 54.3% 0.0 pp 35
Llama 3.1 8B Meta 45.7% 48.6% +2.9 pp 35
GPT-OSS 120B Open-src 42.9% 45.7% +2.9 pp 35
Llama 4 Scout 17B Meta 40.0% 45.7% +5.7 pp 35
Overall 59.1% 59.4% +0.3 pp 350

By Question Type

Question Type JSON Zen Grid Delta
Lookup 95.7% 95.7% 0.0 pp
Filtering 52.9% 51.4% −1.4 pp
Count 51.4% 48.6% −2.9 pp
Aggregation 47.1% 51.4% +4.3 pp
Comparison 48.6% 50.0% +1.4 pp

Key Findings

Four of ten models improve with Zen Grid (Kimi K2 +5.7pp, Llama 4 Scout +5.7pp, Llama 3.1 8B +2.9pp, GPT-OSS 120B +2.9pp), three are neutral (GPT-5-mini, Gemini 3 Pro, Llama 3.3 70B), and three regress (GPT-5.1 −8.6pp being the worst). Overall, Zen Grid is +0.3 pp ahead of JSON for ~20% fewer tokens — a clear win on cost-per-correct-answer. Lookup tasks (95.7%) are perfectly preserved across formats.

LLM Generation Results

Can LLMs produce valid Zen Grid output? We tested 12 models from 6 providers with few-shot and zero-shot prompting on the JTON 1.0 [N: ...] syntax:

Model Few-shot Valid Zero-shot Valid
GPT-5-mini 100% 100%
GPT-5.1 100% 100%
GPT-4o 100% 100%
Claude Sonnet 4 100% 100%
Claude 3.5 Haiku 100% 100%
Claude 3 Haiku 100% 100%
Gemini 2.5 Flash 100% 100%
Gemini 2.5 Pro 100% 100%
Gemini 3 Flash Preview 100% 100%
Llama 3.3 70B 100% 100%
Llama 4 Scout 17B 100% 100%
Kimi K2 100% 100%
Overall 100% 100%

All 12 models achieve 100% validity in both modes. Zen Grid works for bidirectional LLM pipelines -- both input and output.

Format Comparison

Token counts on real-world data (o200k_base tokenizer):

Format Twitter GitHub Financial Avg Savings vs JSON
JSON Compact 3,673 968 643 baseline
CSV 1,303 688 408 −43.3% (no types)
Markdown 1,430 792 505 −33.6% (no types)
YAML 1,916 1,185 840 +1.7%
Zen Grid 1,653 968 516 −24.9% (full types)

Zen Grid is the only JSON-compatible format that achieves significant token savings while preserving JSON's full type system.


Features

✅ Implemented

  • Full JSON Compatibility — parse any valid RFC 8259 JSON
  • Zen Grid Tables[: header; row1; row2 ] with auto-detection and round-trip
  • Unquoted Keys{name: "value"} instead of {"name": "value"}
  • Comments// single-line and /* */ block comments
  • Special NumbersInfinity, -Infinity, NaN
  • dumps() Serializer — compact JSON + Zen Grid output
  • Pydantic SupportBaseModel (v1 dict() + v2 model_dump()) serialization
  • Python Dataclasses@dataclass instances via dataclasses.asdict()
  • encode / decode Aliases — drop-in for orjson/msgspec users
  • SIMD Scanner — AVX2 + AVX-512 structural character indexing
  • Strict Number Parsing — rejects -01, 1., 0.e1, -.5, 1+2
  • Enhanced Errors — 40-character context window with ^ markers
  • Schema-guided Parsing — optional schema parameter for 2–3× speedup on homogeneous data
  • Type Stubs__init__.pyi + py.typed for IDE/mypy support

🚧 Planned

  • Parallel Parsing — multi-core processing for very large files
  • loads(type=Model) — automatic Pydantic model deserialization

Examples

LLM Prompt Optimization

import jton

# Large tabular dataset to send to an LLM
employees = [
    {"id": 1, "name": "Alice", "dept": "Engineering", "salary": 95000, "years": 3},
    {"id": 2, "name": "Bob",   "dept": "Marketing",   "salary": 72000, "years": 5},
    # ... thousands more rows
]

# Standard JSON: every key repeated per row → high token cost
json_str = json.dumps(employees)   # "id", "name", "dept" repeated 1000× each

# JTON Zen Grid: headers written once
JTON_str = jton.dumps(employees)
# → '[: id, name, dept, salary, years; 1, "Alice", "Engineering", 95000, 3; ... ]'

# Up to 50% fewer tokens for large tabular datasets

Configuration Files

config = jton.loads('''
{
    // Server settings
    host: "0.0.0.0",
    port: 8080,

    // Database configuration
    database: {
        host: "db.example.com",
        port: 5432,
        name: "production"
    },

    workers: 4,    // CPU cores
    timeout: 30    // seconds
}
''')

API Response Processing

# JTON parses both standard JSON and JTON extensions
response = jton.loads('{"status": "ok", "users": [{id: 1, name: "Alice"}]}')

# Serialize back with token savings
payload = jton.dumps(response)

Testing

# All tests
pytest tests/ -v

# JSON spec compliance
pytest tests/test_json_compatibility.py -v

# Zen Grid round-trip tests
pytest tests/test_zen_grid.py -v

# Reference vector suite (JSONTestSuite corpus)
pytest tests/test_reference_vectors.py -v

Test results: 622 passed, 54 skipped, 58 xfailed, 0 failed

Test Coverage

Suite Tests Coverage
test_json_compatibility.py 39 JSON primitives, nesting, escapes, errors
test_reference_vectors.py 644+ parametrized JSONTestSuite corpus (valid/invalid JSON)
test_zen_grid.py 45 Zen Grid serialization, parsing, round-trips, Pydantic, dataclass
Other suites ~94 Token reduction, SIMD, number parsing

Benchmark References

JTON performance benchmarks use the same standardized test vectors as the wider JSON ecosystem.

Compliance Testing

  • JSONTestSuite (Nicolas Seriot) — 400+ JSON conformance tests for parsers; used by orjson, simdjson, and JTON
  • RFC 8259 — The IETF JSON specification (December 2017)
  • JSON_checker — Classic pass/fail fixtures (fail01–fail33)

Performance Benchmark Files

The canonical benchmark corpus from nativejson-benchmark (Milo Yip), used by orjson, simdjson, yyjson, and JTON:

File Size Dataset Characteristics
canada.json 2.15 MB GeoJSON coordinates Number-heavy (float arrays)
twitter.json 0.60 MB Twitter API timeline Unicode strings, nested objects
citm_catalog.json 1.65 MB Cinema IT Management catalog Mixed content, real-world API
large.json 7.88 MB Custom: 100K rows tabular JTON primary benchmark

Competing Libraries Referenced

Speed-Focused JSON Libraries

Library GitHub Speed Notes
orjson GitHub 586 MB/s dumps Rust-based; JSON only
ujson GitHub ~300 MB/s C-based
pysimdjson GitHub 1–2 GB/s parse Python bindings for simdjson
simdjson GitHub 2–3 GB/s C++; architecture inspiration for JTON
yapic.json GitHub ~2–3× stdlib Python/C extension

Token Efficiency Formats

Format GitHub Token Savings Approach JSON-Compatible
JTON Zen Grid This repo 11–50% Column headers once ✅ Yes
TOON GitHub ~19% Table-oriented ❌ No

Development

Build from Source

# Install prerequisites
pip install maturin

# Debug build (fast compilation)
maturin develop

# Release build (optimized, recommended for benchmarking)
maturin develop --release

Windows + Python 3.13: Set $env:PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1 before building.

Project Structure

src/
├── jton/                        # Python package (pip install jton)
│   ├── __init__.py              # Public API: loads, dumps, encode, decode, token_count
│   ├── __init__.pyi             # Type stubs (mypy/pyright)
│   ├── cli.py                   # jton CLI entry point
│   └── py.typed                 # PEP 561 marker
└── jton_core/                   # Rust implementation
    └── src/
        ├── lib.rs               # PyO3 module: loads(), dumps(), format_hint()
        ├── serializer.rs        # Zen Grid + JSON serializer, AVX-512 escape path
        ├── types/               # StructuralIndex, FieldDescriptor
        ├── simd/                # AVX2 / AVX-512 structural scanners
        └── parser/              # SIMD indexed parser, string_cache, number parsing

tests/
├── test_json_compatibility.py   # JSON spec conformance
├── test_zen_grid.py             # Zen Grid encoding/decoding + CLI
└── test_reference_vectors.py    # JSONTestSuite corpus (600+ vectors)

benchmarks/
├── run_all_benchmarks.py        # Token efficiency benchmark (8 formats × 6 datasets)
└── results/token_efficiency.md  # Latest benchmark results

Language Support

JTON officially supports Python only.

The SIMD-accelerated parser (AVX2/AVX-512 structural scanning, VPSHUFB nibble classifier, thread-local string cache) is a PyO3 native extension. The performance advantage is inseparable from the Python binding. The format spec is in SPEC.md for anyone who wants to implement JTON in another language.

Language Status Install
Python 3.11+ ✅ Official pip install jton
All others Implement from SPEC.md

Requirements

Requirement Version
Python 3.11+
Rust 1.70+
CPU AVX2 (2013+ Intel/AMD)
CPU (optional) AVX-512 for 2× SIMD throughput

Safety

  • Depth guard: MAX_NESTING_DEPTH = 100 — prevents stack overflow from deeply nested input
  • Arity tolerance: Extra table columns are silently dropped; missing columns are filled with null
  • Memory safety: All unsafe Rust code is in clearly marked blocks using PyO3 FFI patterns
  • No allocation on GIL drop: JTON never releases the GIL mid-parse, avoiding data races

CI / Publishing

Three GitHub Actions workflows are included:

Workflow File Trigger Purpose
CI .github/workflows/ci.yml Push / PR Build + test on Linux, Windows, macOS × Python 3.10–3.13
Release .github/workflows/release.yml git push --tags v* Build manylinux/macOS/Windows wheels, publish to PyPI via OIDC, draft GitHub Release
Security Audit .github/workflows/audit.yml Weekly (Mon 08:00 UTC) cargo audit against RustSec advisory database

Publishing to PyPI

The release workflow uses PyPI Trusted Publishing (OIDC) — no API token needed:

  1. Go to pypi.org/manage/account/publishing
  2. Add a new publisher:
    • Owner: your GitHub username
    • Repository: JTON
    • Workflow: release.yml
    • Environment: pypi
  3. Push a version tag to trigger the release:
    git tag v0.2.0
    git push --tags
    
  4. The workflow builds wheels for all platforms, publishes to PyPI, and creates a draft GitHub Release with wheel files attached.

License

MIT — see NOTICE for full text.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jton-1.0.3.tar.gz (3.1 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

jton-1.0.3-cp313-cp313-win_amd64.whl (400.0 kB view details)

Uploaded CPython 3.13Windows x86-64

jton-1.0.3-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (516.6 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

jton-1.0.3-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (512.4 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ ARM64

jton-1.0.3-cp313-cp313-macosx_11_0_arm64.whl (486.4 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

jton-1.0.3-cp313-cp313-macosx_10_12_x86_64.whl (501.5 kB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

jton-1.0.3-cp312-cp312-win_amd64.whl (400.0 kB view details)

Uploaded CPython 3.12Windows x86-64

jton-1.0.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (516.6 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

jton-1.0.3-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (512.4 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ ARM64

jton-1.0.3-cp312-cp312-macosx_11_0_arm64.whl (486.4 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

jton-1.0.3-cp312-cp312-macosx_10_12_x86_64.whl (501.5 kB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

jton-1.0.3-cp311-cp311-win_amd64.whl (400.2 kB view details)

Uploaded CPython 3.11Windows x86-64

jton-1.0.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (517.7 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

jton-1.0.3-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (512.2 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ ARM64

jton-1.0.3-cp311-cp311-macosx_11_0_arm64.whl (487.5 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

jton-1.0.3-cp311-cp311-macosx_10_12_x86_64.whl (502.6 kB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

jton-1.0.3-cp310-cp310-win_amd64.whl (400.2 kB view details)

Uploaded CPython 3.10Windows x86-64

jton-1.0.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (517.7 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

jton-1.0.3-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (512.2 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ ARM64

jton-1.0.3-cp310-cp310-macosx_11_0_arm64.whl (486.9 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

jton-1.0.3-cp310-cp310-macosx_10_12_x86_64.whl (502.1 kB view details)

Uploaded CPython 3.10macOS 10.12+ x86-64

File details

Details for the file jton-1.0.3.tar.gz.

File metadata

  • Download URL: jton-1.0.3.tar.gz
  • Upload date:
  • Size: 3.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for jton-1.0.3.tar.gz
Algorithm Hash digest
SHA256 81012046ef4f6bc41cd732a4561eff2c998e387f22119ee77b9e6cf58697dcd2
MD5 41312872bbf2ac1f244b40daad335fd9
BLAKE2b-256 52d0f081ee47680d2b2fb9f29f3b40455fef79e874482a7ce506eeba5002b02f

See more details on using hashes here.

File details

Details for the file jton-1.0.3-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: jton-1.0.3-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 400.0 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for jton-1.0.3-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 23082701a056e5b7cd51a6b75b84f397f4744746a6883e008a51c0190de6dd05
MD5 d2c0a42870a71c441d97a586c62bf0dd
BLAKE2b-256 74d12c6eda6680f5cd443fa32cc05728691779a75ea5d84750fc3cdf26621d4d

See more details on using hashes here.

File details

Details for the file jton-1.0.3-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for jton-1.0.3-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6e91656de0826584cf2027a6c0c307bbf92dbb21e2523af54c9c0450056b099a
MD5 3e23fed5cde6778033abf580e0aeb590
BLAKE2b-256 b19619a1ffd1ad9157ec2203884f01e8e632315ca14ddbd15b5bb89245077387

See more details on using hashes here.

File details

Details for the file jton-1.0.3-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for jton-1.0.3-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 763010fea02ee2f86c94a0ee691146022e1581cb3f984b4eda46a6c110065f08
MD5 396dd978dda8367a30376a8b04230d80
BLAKE2b-256 44c439ccee5cbeae11c787355239a2ed2eec2f69c1c0bfd6b84ef8a3044f261c

See more details on using hashes here.

File details

Details for the file jton-1.0.3-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for jton-1.0.3-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6d7be42c9fcbe38e19afefb1416d4772fa52a3b40e6973c696f250d8145bb46e
MD5 fc3b20b5afcd5fff4cab1f1cc489a967
BLAKE2b-256 4c6373697c39a392432b24e922457e792e913c4b01afa16d52b2e8341bc3020d

See more details on using hashes here.

File details

Details for the file jton-1.0.3-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for jton-1.0.3-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 9319f1d4518b8f72b27212ab99fdf13e6d80fe3e161873620dec12110b0ecefa
MD5 70a3078e894dabb34749ab1fe18957bc
BLAKE2b-256 79d358b70f6e5c80856d3daabaf87a591ec3ef8a10ef3d3d51088fc4e84a3c73

See more details on using hashes here.

File details

Details for the file jton-1.0.3-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: jton-1.0.3-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 400.0 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for jton-1.0.3-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 eeb83c1f4b71fa25aa15178907ea38f807ab8b750a92e873376d8a3fc0ff0b72
MD5 ad4e36032bd7759cf5beb24816dc4230
BLAKE2b-256 072f79c3196fecc06feb8f90db862a9aa69ab6292c4e70f7225aa768f712b18e

See more details on using hashes here.

File details

Details for the file jton-1.0.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for jton-1.0.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 7ebe34f03b90f35c9ba9834560dabc197509a1cf4d0fce46bb1c12dbf5094531
MD5 9d38524d35b8a823de673e98b979f8da
BLAKE2b-256 12097f4d7975452c61cc97f3bf10843be974718bf4b865966983e2e7b0fc3977

See more details on using hashes here.

File details

Details for the file jton-1.0.3-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for jton-1.0.3-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 faa8b86125459aa7b878ac0192c2a7e54ff4a4f6c450acbb16120322ebbf05b3
MD5 8088a3707f1842bd6fdfe72e4e0ae551
BLAKE2b-256 5c55959892aefa66845f0edc5e061d5aa5a0f987204f4a0f8051d695e6854cc8

See more details on using hashes here.

File details

Details for the file jton-1.0.3-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for jton-1.0.3-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 810d67186063bfda15183e383373385d7b5afd82e95af431297cba7155414ae0
MD5 21767fafbc3c1891b7782d7f3cf64b3b
BLAKE2b-256 3785282abb73f84e9185a4f03428dabd6ec9055ffd621af8f657ed8d54fa09ac

See more details on using hashes here.

File details

Details for the file jton-1.0.3-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for jton-1.0.3-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 11a0fe7eb07f3504d6ed029408e5841f4b7e0e6d5082f9f8d52a7ac755c15259
MD5 08b77a9f3c0fb9bb8ecd517891c8d58b
BLAKE2b-256 86aa9646f384ec712af9e8c9297bbb545ea4644c531839b84ef431dadb0429fd

See more details on using hashes here.

File details

Details for the file jton-1.0.3-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: jton-1.0.3-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 400.2 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for jton-1.0.3-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 39f05bfd322653900631fd6d5d6eb223b7a8609aa55a06b6e050f7f0d4cc30a0
MD5 e0e9afa953e1f007d49891b32afea557
BLAKE2b-256 e92052a0549f0bdc2a01295d07667f8fb3356fca8adbeaf0956243f8753031e9

See more details on using hashes here.

File details

Details for the file jton-1.0.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for jton-1.0.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c928e774263e0009832f85f1c3954e0bae47a950282e81ab5fb510797c52aca9
MD5 5449e0460092bd6032d63cf7f2eba4c0
BLAKE2b-256 9fdb4597211b3729399f276492c2ce71a6cc18c5f0e15bdbb1130f5b78b14779

See more details on using hashes here.

File details

Details for the file jton-1.0.3-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for jton-1.0.3-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 802a34980943b043f57f014aef1bee1c963cc5af620573efac0cc6d0538c1bfc
MD5 f46f5c4bb17ef26b4a7f7a28bdbba37e
BLAKE2b-256 b2ab5de07d0d7ff0a2425e3b978b49e9caf1bfb16e7d0b1303aeb5328df45482

See more details on using hashes here.

File details

Details for the file jton-1.0.3-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for jton-1.0.3-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ae0648f65f218e580e0f35e065817b742849288154850187b376c7a32d3244cb
MD5 eb1e333d921d24b16f3a789e950e86df
BLAKE2b-256 e4aa71ad51271190fc7c46d011774103d08744fe37e8942c93dd0f234665e1aa

See more details on using hashes here.

File details

Details for the file jton-1.0.3-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for jton-1.0.3-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 ae00b9727ca0078016d7e94f655d0eb71b1e018955bd1279c95fcae929540830
MD5 2535400ef76e777a607bc6d0fd99d1ae
BLAKE2b-256 24f89dd3f83897bb35d85c962f3505539b70b2514c360e75066b728237adeb74

See more details on using hashes here.

File details

Details for the file jton-1.0.3-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: jton-1.0.3-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 400.2 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for jton-1.0.3-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 22e4504c85b665721e2ee912eeba772b7309c487cdda158b789d41207186406d
MD5 f306de10bbce18c7aa9fb38b41f91bce
BLAKE2b-256 82738da1e98e5031bc94afaf7e09918d1fdc1b9fe6de30955c180ee943940ae0

See more details on using hashes here.

File details

Details for the file jton-1.0.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for jton-1.0.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a5ed24521b84dce6f0b1f4631a43d7575ab5e18991fc83af76befc15414a32a5
MD5 15bacbf906dfeb9f3bf0d8bed244d3e4
BLAKE2b-256 39c857a8d93aa54e52136f411062d4e8c8bf886e93e4b4ca241b15fdff4e6331

See more details on using hashes here.

File details

Details for the file jton-1.0.3-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for jton-1.0.3-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 270bebe582f8ab5bd190a9572b4259d849a876b2a6b51ef156a8a6b5b4686358
MD5 913a2036a02d3fbec8839401a3d3f880
BLAKE2b-256 b1a0188db22bb6b02e5496e9e87d7cc2f7e7fee13ee100f72f2199697442809a

See more details on using hashes here.

File details

Details for the file jton-1.0.3-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for jton-1.0.3-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f4beba6dddd39a8e3221aff7388250304a876712c5dc2a8ca2a9c03ecb14b5ef
MD5 8a6f6e88855be888c15ca5aa48c7a52c
BLAKE2b-256 e4b50d320a904cadb7f586b698e1afb1968713d109f19bd4bc965a8996dd53ac

See more details on using hashes here.

File details

Details for the file jton-1.0.3-cp310-cp310-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for jton-1.0.3-cp310-cp310-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 3b744a55736767d38b80a5aac682be8afbcd4e00d7ecb6ee876b4b38758c2257
MD5 640aadac04d0b22feb3c6f540d038cab
BLAKE2b-256 869526b61769629c2b2899adfe96532a8cf4a84e3cee01a5ba8b2df1aa4b2740

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page