Skip to main content

JTON (JSON Tabular Object Notation) — A Tabular JSON-Superset Encoding for Efficient LLM Data Processing

Project description

JTON

JTON (JSON Tabular Object Notation) — A high-performance, token-efficient JSON superset built in Rust with PyO3 bindings for Python. Home of Zen Grid, a token-aware tabular encoding that reduces LLM token costs by 15–60%.

Tests Performance SIMD License


Overview

JTON is a JSON superset designed for LLM applications and high-throughput data processing:

  • Zen Grid: Tabular encoding reduces token count by 15–60% on benchmarked data vs JSON compact (23% average across 6 datasets, ~60% on highly-tabular Twitter-style rows)
  • LLM-Validated: 10 models tested for comprehension, 13 models tested for generation -- all achieve 100% generation validity (100% few-shot, 100% zero-shot)
  • SIMD Acceleration: AVX2 (32-byte) and AVX-512 (64-byte) structural scanning
  • JSON-compatible: supports load() / loads() / dump() / dumps() for common JSON workflows — all valid JSON is valid JTON
  • Serialization: dumps() with Zen Grid table output, Pydantic v1/v2 and dataclass support
  • JSON Extensions: Unquoted keys, // and /* */ comments, Infinity/NaN special numbers
  • Strict correctness: Rejects invalid JSON numbers (-01, 1., 0.e1) that many parsers accept silently

Quickstart

Installation

# From PyPI (once published)
pip install jton

# From source (requires Rust 1.70+ — https://rustup.rs/)
git clone https://github.com/gowthamkumar-nandakishore/jton.git
cd JTON
pip install maturin
maturin develop --release

Basic Usage

import jton

# Standard JSON parsing
data = jton.loads('{"name": "Alice", "age": 30}')

# JTON extensions — unquoted keys
data = jton.loads('{name: "Alice", age: 30}')

# Comments for configuration files
config = jton.loads('''
{
    host: "localhost",   // server address
    port: 8080,         /* default port */
    timeout: 30         // seconds
}
''')

# Special numbers (Python compatibility)
data = jton.loads('{x: Infinity, y: -Infinity, z: NaN}')

# Serialize to compact JSON
jton.dumps({"name": "Alice", "age": 30})
# → '{"name":"Alice","age":30}'

# encode/decode aliases (familiar for orjson/msgspec users)
jton.encode(data)   # same as dumps()
jton.decode(text)   # same as loads()

Using JTON with existing json-based code

JTON works well with existing JSON-heavy codebases — with one important rule:

  • Parsing is the easy win: jton.load() / jton.loads() accept normal JSON unchanged
  • Serialization is JSON-compatible for common usage, but JTON's serializer has an extra default: zen_grid=True
  • That means if you simply replace import json with import jton as json, some list[dict] payloads may serialize as Zen Grid instead of strict RFC 8259 JSON

What happens if you just replace json with jton?

import jton as json

# Existing JSON input still works
obj = json.loads('{"name":"Alice","age":30}')

# Existing file APIs still work
with open("data.json") as f:
    obj = json.load(f)

For output:

import jton as json

json.dumps({"name": "Alice", "age": 30})
# -> '{"name":"Alice","age":30}'   # still normal JSON

json.dumps([
    {"id": 1, "name": "Alice"},
    {"id": 2, "name": "Bob"},
])
# -> '[2: id, name; 1, "Alice"; 2, "Bob" ]'   # JTON Zen Grid, not strict JSON

So the practical behavior is:

  • load() / loads(): safe replacement for existing JSON parsing
  • dump() / dumps() on ordinary objects: usually still emits JSON
  • dump() / dumps() on homogeneous arrays of objects: may emit Zen Grid by default

Recommended migration path

1. Parsing-only replacement

If you want a no-risk first step, replace only parsing:

import jton

data = jton.loads(existing_json_text)

This gives you faster parsing plus JTON extensions on input, without changing output format anywhere.

2. Full json-module replacement, but keep strict JSON output

If you want import jton as json, but still need normal JSON output:

import jton as json

text = json.dumps(obj, zen_grid=False)
with open("out.json", "w") as f:
    json.dump(obj, f, zen_grid=False)

This is the safest pattern for APIs, files, and systems that expect standard JSON.

3. Enable JTON only where it helps

Use strict JSON for machines, and JTON for LLM-facing payloads:

api_payload = jton.dumps(data, zen_grid=False)   # strict JSON
llm_payload = jton.dumps(data)                   # JTON / Zen Grid when eligible

JSON-in / JSON-out compatibility cheat sheet

If you want output that stays valid JSON, use:

jton.dumps(
    obj,
    zen_grid=False,
    unquoted_keys=False,
    bare_strings=False,
    implicit_null=False,
    multiline_zen=False,
)

Notes:

  • Only zen_grid=False is required for strict JSON output in normal use
  • row_count and delimiter matter only when zen_grid=True
  • indent=2 or indent=4 still gives valid pretty JSON
  • default= works like json.dumps(default=...)

When output stops being strict JSON

These options move you out of plain JSON output:

  • zen_grid=True on homogeneous list[dict] values
  • unquoted_keys=True
  • multiline_zen=True
  • bare_strings=True inside Zen Grid cells
  • implicit_null=True inside Zen Grid cells

So the short rule is:

  • Use jton.loads() everywhere
  • Use jton.dumps(..., zen_grid=False) anywhere strict JSON is required
  • Use default jton.dumps() for LLM/token-optimized payloads

Current compatibility scope

JTON supports the common Python JSON workflow:

  • load
  • loads
  • dump
  • dumps
  • default=...
  • file objects

It is not a byte-for-byte clone of every stdlib json keyword argument yet. Think of it as:

  • fully compatible parser for JSON input
  • mostly compatible serializer for common usage
  • plus opt-in JTON/Zen Grid output when you want it

Zen Grid: Token-Efficient Table Format

When you pass a list of dicts to dumps(), JTON automatically detects the tabular structure and encodes it as a Zen Grid — one header row followed by semicolon-delimited data rows, all inline.

The format

[N: col1, col2, col3; val1, val2, val3; val4, val5, val6 ]
 ↑         ↑                  ↑
row count  headers           one record per semicolon segment
  • N — total row count (helps LLMs understand the data size at a glance)
  • First segment after [N: = comma-separated field names
  • Each subsequent segment = one record, values in the same order

Example

import jton

users = [
    {"id": 1, "name": "Alice", "score": 95},
    {"id": 2, "name": "Bob",   "score": 87},
    {"id": 3, "name": "Carol", "score": 92},
]

# Standard JSON compact — 116 chars, ~32 tokens:
# [{"id":1,"name":"Alice","score":95},{"id":2,"name":"Bob","score":87},{"id":3,"name":"Carol","score":92}]

# JTON Zen Grid — 72 chars, ~22 tokens (31% fewer tokens):
print(jton.dumps(users))
# → '[3: id, name, score; 1, "Alice", 95; 2, "Bob", 87; 3, "Carol", 92 ]'

# Disable Zen Grid for standard JSON output
print(jton.dumps(users, zen_grid=False))
# → '[{"id":1,"name":"Alice","score":95},...]'

Round-trip correctness

Zen Grid is valid JTON — jton.loads() parses it back to the original data:

original = [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]
encoded  = jton.dumps(original)           # → '[2: id, name; 1, "Alice"; 2, "Bob" ]'
decoded  = jton.loads(encoded)            # → [{"id": 1, "name": "Alice"}, ...]
assert decoded == original                # ✅ perfect round-trip

Token count analysis

import jton

data = [{"id": i, "name": f"User{i}", "score": i*5} for i in range(100)]
counts = jton.token_count(data)  # requires: pip install tiktoken
# {
#   'json_compact': {'tokens': 2843, 'savings_vs_compact': '+0.0%'},
#   'zen_grid':     {'tokens': 1820, 'savings_vs_compact': '-36.0%'},
# }

LLM integration

Add a one-line format hint to your system prompt before sending Zen Grid data:

import jton

system_prompt = jton.format_hint() + "\n\n" + jton.dumps(my_data)
Data is in JTON Zen Grid format.
Format: [N: col1, col2, col3; row1val1, row1val2, row1val3; ... ]
N = total row count. First semicolon-segment = headers.
Each subsequent segment = one record in header order.
Example: [3: id, name, score; 1, Alice, 95; 2, Bob, 87; 3, Carol, 92 ]
original = [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]
encoded = jton.dumps(original)                # → '[2: id, name; 1, "Alice"; 2, "Bob" ]'
decoded = jton.loads(encoded)                 # → [{"id": 1, "name": "Alice"}, ...]
assert decoded == original                    # ✅ perfect round-trip

Pydantic and Dataclass Support

from pydantic import BaseModel
from dataclasses import dataclass
import jton

# Pydantic v2 (model_dump)
class User(BaseModel):
    id: int
    name: str
    email: str

users = [User(id=1, name="Alice", email="a@example.com"),
         User(id=2, name="Bob",   email="b@example.com")]

print(jton.dumps(users))
# → '[2: id, name, email; 1, "Alice", "a@example.com"; 2, "Bob", "b@example.com" ]'

# Python dataclasses
@dataclass
class Point:
    x: float
    y: float

print(jton.dumps(Point(x=1.5, y=2.5)))
# → '{"x":1.5,"y":2.5}'

# Parse directly into a Pydantic model
user_data = jton.loads('{"id":1,"name":"Alice","email":"a@ex.com"}')
user = User(**user_data)
# → User(id=1, name='Alice', email='a@ex.com')

# Parse into a dataclass
pt_data = jton.loads('{"x":1.5,"y":2.5}')
pt = Point(**pt_data)
# → Point(x=1.5, y=2.5)

API Reference

JTON provides the core load, dump, loads, and dumps APIs used in common JSON workflows. The main behavioral difference is that dumps() defaults to zen_grid=True, which may emit JTON Zen Grid for homogeneous arrays of objects.

jton.loads(data, schema=None)

Parse JTON or JSON data into Python objects.

jton.loads('{"a": 1}')          # → {"a": 1}
jton.loads(b'{"a": 1}')         # bytes input OK
jton.loads('{a: 1}')            # unquoted keys OK
jton.loads('// comment\n{a:1}') # comments OK

jton.load(fp)

Parse JTON/JSON from a file object — compatible with normal json.load() usage.

with open("data.json") as f:
    data = jton.load(f)

jton.dumps(data, *, zen_grid=True, ..., default=None)

Serialize Python objects to JTON/JSON string — compatible with common json.dumps() usage.

Parameter Type Default Description
data Any required Python object to serialize
zen_grid bool True Auto-convert lists of dicts to Zen Grid table format
unquoted_keys bool False Write dict keys without quotes
indent int | None None Pretty-print with given indent width
bare_strings bool False Write identifier string values without quotes in cells
implicit_null bool False Write null cells as empty (saves ~1 token per cell)
row_count bool True Prefix Zen Grid header with [N: ...] row count
delimiter str "comma" "comma" (readable), "tab" (max savings), "pipe"
default callable | None None For non-serializable objects, same as json.dumps(default=...)

If you want strict JSON output, set zen_grid=False.

# Standard usage
jton.dumps({"a": 1}, zen_grid=False)         # → '{"a":1}'

# Custom types with default=
from datetime import date
jton.dumps({"d": date(2025,1,1)}, default=str)
# → '{"d":"2025-01-01"}'

# Works with Zen Grid too
jton.dumps([{"id":1,"d":date(2025,1,1)}], default=str)
# → '[1: id, d; 1, "2025-01-01" ]'

Supported types natively: dict, list, tuple, str, int, float, bool, None, Pydantic BaseModel (v1+v2), @dataclass

jton.dump(obj, fp, **kwargs)

Serialize to a file object — compatible with common json.dump() usage.

with open("out.jton", "w") as f:
    jton.dump(data, f)

jton.format_hint(style="zen_grid")

Return a format description for pasting into LLM system prompts.

style Description
"zen_grid" Default inline format (mentions both [: and [N: forms)
"zen_grid_rowcount" Inline with explicit [N] row count
"multiline" Multi-line format
"tab" Tab-delimited

jton.token_count(data, tokenizer="o200k_base")

Compare token costs across all output modes. Requires pip install tiktoken.

Returns a dict mapping mode names to {"tokens": int, "chars": int, "savings_vs_compact": str}.

jton.encode / jton.decode

Aliases for dumps / loads — familiar for users of orjson or msgspec.


CLI Tool

JTON ships a command-line tool for JSON ↔ Zen Grid conversion:

# After pip install or maturin develop
JTON input.json                    # encode JSON → Zen Grid (stdout)
JTON input.json -o output.JTON     # encode to file
JTON input.JTON -o output.json     # decode Zen Grid → JSON (auto-detected)
JTON input.json --stats            # show token savings
echo '{"x":1}' | JTON             # pipe stdin
JTON input.json --tab              # tab-delimited Zen Grid
JTON input.json --no-zen-grid      # plain compact JSON
JTON input.json --indent 2         # pretty-print JSON
JTON --hint                        # print LLM system-prompt template
JTON --version                     # show version

Or run directly without installation:

python -m jton.cli input.json --stats

Playground

Run the interactive playground locally to explore all JTON features:

# From the repo root (after maturin develop)
python playground/server.py

# Opens at http://127.0.0.1:7700
# Optional: pip install tiktoken  (enables live token count bars)

The playground provides:

  • Live JSON → JTON conversion with all encoding options as toggles
  • Token comparison bars (JSON pretty / JSON compact / JTON current)
  • Char savings % vs JSON compact
  • Round-trip indicator — shows if decode(encode(x)) == x
  • Format hint copier — paste into LLM system prompts
  • Sample datasets — employees, orders, analytics, deep config, GitHub repos
  • Decode mode — paste JTON output, get back pretty JSON

Performance

Speed Comparison (real-world files: canada.json 2.25 MB, citm_catalog.json 1.78 MB, twitter.json 0.65 MB)

Library loads dumps (JSON mode) Notes
stdlib json 63–184 MB/s 46–268 MB/s Pure Python/C
JTON 132–346 MB/s 197–276 MB/s Rust/SIMD, JSON mode
JTON Zen Grid 81–240 MB/s Rust, table output
orjson 235–458 MB/s 440–533 MB/s Rust, JSON only
  • JTON loads is 1.5–2.1× faster than stdlib (json.loads)
  • JTON dumps JSON mode is 1.0–4.3× faster than stdlib
  • JTON Zen Grid dumps saves 14–60% tokens (depending on data shape) while maintaining competitive throughput
  • orjson is faster on raw JSON; JTON's advantage is Zen Grid token reduction which orjson cannot provide

Large-file static benchmark: akbe_doc_classifier.json (338.1 MB)

Measured on this machine using the repository's akbe_doc_classifier.json payload in JSON-compatible mode (zen_grid=False for JTON dump):

Operation stdlib json JTON Result
Parse / decode 1.75 s (193.5 MB/s) 2.43 s (138.9 MB/s) stdlib faster on this file
Dump / encode 1.78 s (57.3 MB/s) 0.81 s (126.5 MB/s) JTON 2.2× faster

Notes:

  • This file is a large, object-heavy classifier payload rather than a tabular Zen Grid sweet spot
  • On this benchmark, JTON wins strongly on dump/encode
  • Stdlib json wins on parse/decode for this specific file shape
  • Output benchmarking used JSON-compatible serialization (jton.dumps(..., zen_grid=False))

SIMD Acceleration

JTON uses a two-pass SIMD parsing strategy modeled after simdjson:

  1. Structural scan (AVX2/AVX-512): Build index of {}[],:;" positions in a single pass
  2. Index-jumping parse: Navigate the pre-built index without byte-by-byte scanning
Feature Details
AVX2 32-byte chunks, 2013+ Intel/AMD CPUs
AVX-512 64-byte chunks, 2017+ Intel CPUs
Runtime detection Automatically selects best available ISA
Float parsing lexical-core (same algorithm as orjson)
Int serialization itoa crate (fastest known)
Float serialization ryu crate (shortest round-trip, same as orjson)
String cache Thread-local UnsafeCell — zero mutex overhead under Python GIL

Token Efficiency

JTON vs Competing Formats

Benchmarked on 6 real-world datasets using tiktoken o200k_base encoder:

Format Total Tokens vs JSON compact JSON-Compatible
TRON 122,097 −32.4% ❌ No (new syntax)
JTON Zen Grid 144,159 −20.2% ✅ Yes (JTON superset)
TOON 146,113 −19.2% ❌ No (new syntax)
JSON compact 180,725 ✅ Yes

JTON is #2 most token-efficient and the only JSON-superset format in the top 3 — TRON and TOON require custom parsers.

Real-world LLM Token Savings

Dataset JSON compact JTON Zen Grid Savings
👥 2,000 employees (7 cols) 97,407 77,226 −20.7%
📈 365 days analytics 14,240 10,604 −25.5%
⭐ 100 GitHub repos 11,729 9,626 −17.9%
🛒 500 orders (nested) 46,381 39,565 −14.7%
🧾 300 event logs (semi-uniform) 10,745 6,915 −35.6%

LLM Comprehension Evaluation

We evaluated whether LLMs can correctly interpret Zen Grid data across 10 models from six providers, using 7 real-world datasets × 5 question types × 2 formats (700 total API calls). Uses JTON 1.0 [N: ...] syntax with bare_strings=True.

Per-Model Results

Model Family JSON Zen Grid Delta n
GPT-5.1-codex OpenAI 74.3% 71.4% −2.9 pp 35
GPT-5.1 OpenAI 71.4% 62.9% −8.6 pp 35
GPT-5-mini OpenAI 71.4% 71.4% 0.0 pp 35
Gemini 3 Pro Preview Google 68.6% 68.6% 0.0 pp 35
Kimi K2 Moonshot 62.9% 68.6% +5.7 pp 35
Qwen3 32B Alibaba 60.0% 57.1% −2.9 pp 35
Llama 3.3 70B Meta 54.3% 54.3% 0.0 pp 35
Llama 3.1 8B Meta 45.7% 48.6% +2.9 pp 35
GPT-OSS 120B Open-src 42.9% 45.7% +2.9 pp 35
Llama 4 Scout 17B Meta 40.0% 45.7% +5.7 pp 35
Overall 59.1% 59.4% +0.3 pp 350

By Question Type

Question Type JSON Zen Grid Delta
Lookup 95.7% 95.7% 0.0 pp
Filtering 52.9% 51.4% −1.4 pp
Count 51.4% 48.6% −2.9 pp
Aggregation 47.1% 51.4% +4.3 pp
Comparison 48.6% 50.0% +1.4 pp

Key Findings

Four of ten models improve with Zen Grid (Kimi K2 +5.7pp, Llama 4 Scout +5.7pp, Llama 3.1 8B +2.9pp, GPT-OSS 120B +2.9pp), three are neutral (GPT-5-mini, Gemini 3 Pro, Llama 3.3 70B), and three regress (GPT-5.1 −8.6pp being the worst). Overall, Zen Grid is +0.3 pp ahead of JSON for ~20% fewer tokens — a clear win on cost-per-correct-answer. Lookup tasks (95.7%) are perfectly preserved across formats.

LLM Generation Results

Can LLMs produce valid Zen Grid output? We tested 13 models from 7 providers with few-shot and zero-shot prompting on the JTON 1.0 [N: ...] syntax:

Model Few-shot Valid Zero-shot Valid
GPT-5-mini (WTG) 100% 100%
GPT-5-mini (Azure) 100% 100%
GPT-5.1 100% 100%
GPT-4o 100% 100%
Claude Sonnet 4 100% 100%
Claude 3.5 Haiku 100% 100%
Claude 3 Haiku 100% 100%
Gemini 2.5 Flash 100% 100%
Gemini 2.5 Pro 100% 100%
Gemini 3 Flash Preview 100% 100%
Llama 3.3 70B 100% 100%
Llama 4 Scout 17B 100% 100%
Kimi K2 100% 100%
Overall 100% 100%

All 13 models achieve 100% validity in both modes. Zen Grid works for bidirectional LLM pipelines — both input and output.

Format Comparison

Token counts on real-world data (o200k_base tokenizer):

Format Twitter GitHub Financial Avg Savings vs JSON
JSON Compact 3,673 968 643 baseline
CSV 1,303 688 408 −43.3% (no types)
Markdown 1,430 792 505 −33.6% (no types)
YAML 1,916 1,185 840 +1.7%
Zen Grid 1,653 968 516 −24.9% (full types)

Zen Grid is the only JSON-compatible format that achieves significant token savings while preserving JSON's full type system.


Features

✅ Implemented

  • Full JSON Compatibility — parse any valid RFC 8259 JSON
  • Zen Grid Tables[: header; row1; row2 ] with auto-detection and round-trip
  • Unquoted Keys{name: "value"} instead of {"name": "value"}
  • Comments// single-line and /* */ block comments
  • Special NumbersInfinity, -Infinity, NaN
  • dumps() Serializer — compact JSON + Zen Grid output
  • Pydantic SupportBaseModel (v1 dict() + v2 model_dump()) serialization
  • Python Dataclasses@dataclass instances via dataclasses.asdict()
  • encode / decode Aliases — drop-in for orjson/msgspec users
  • SIMD Scanner — AVX2 + AVX-512 structural character indexing
  • Strict Number Parsing — rejects -01, 1., 0.e1, -.5, 1+2
  • Enhanced Errors — 40-character context window with ^ markers
  • Schema-guided Parsing — optional schema parameter for 2–3× speedup on homogeneous data
  • Type Stubs__init__.pyi + py.typed for IDE/mypy support

🚧 Planned

  • Parallel Parsing — multi-core processing for very large files
  • loads(type=Model) — automatic Pydantic model deserialization

Examples

LLM Prompt Optimization

import jton

# Large tabular dataset to send to an LLM
employees = [
    {"id": 1, "name": "Alice", "dept": "Engineering", "salary": 95000, "years": 3},
    {"id": 2, "name": "Bob",   "dept": "Marketing",   "salary": 72000, "years": 5},
    # ... thousands more rows
]

# Standard JSON: every key repeated per row → high token cost
json_str = json.dumps(employees)   # "id", "name", "dept" repeated 1000× each

# JTON Zen Grid: headers written once
JTON_str = jton.dumps(employees)
# → '[: id, name, dept, salary, years; 1, "Alice", "Engineering", 95000, 3; ... ]'

# Up to 50% fewer tokens for large tabular datasets

Configuration Files

config = jton.loads('''
{
    // Server settings
    host: "0.0.0.0",
    port: 8080,

    // Database configuration
    database: {
        host: "db.example.com",
        port: 5432,
        name: "production"
    },

    workers: 4,    // CPU cores
    timeout: 30    // seconds
}
''')

API Response Processing

# JTON parses both standard JSON and JTON extensions
response = jton.loads('{"status": "ok", "users": [{id: 1, name: "Alice"}]}')

# Serialize back with token savings
payload = jton.dumps(response)

Testing

# All tests
pytest tests/ -v

# JSON spec compliance
pytest tests/test_json_compatibility.py -v

# Zen Grid round-trip tests
pytest tests/test_zen_grid.py -v

# Reference vector suite (JSONTestSuite corpus)
pytest tests/test_reference_vectors.py -v

Test results: 622 passed, 54 skipped, 58 xfailed, 0 failed

Test Coverage

Suite Tests Coverage
test_json_compatibility.py 39 JSON primitives, nesting, escapes, errors
test_reference_vectors.py 644+ parametrized JSONTestSuite corpus (valid/invalid JSON)
test_zen_grid.py 45 Zen Grid serialization, parsing, round-trips, Pydantic, dataclass
Other suites ~94 Token reduction, SIMD, number parsing

Benchmark References

JTON performance benchmarks use the same standardized test vectors as the wider JSON ecosystem.

Compliance Testing

  • JSONTestSuite (Nicolas Seriot) — 400+ JSON conformance tests for parsers; used by orjson, simdjson, and JTON
  • RFC 8259 — The IETF JSON specification (December 2017)
  • JSON_checker — Classic pass/fail fixtures (fail01–fail33)

Performance Benchmark Files

The canonical benchmark corpus from nativejson-benchmark (Milo Yip), used by orjson, simdjson, yyjson, and JTON:

File Size Dataset Characteristics
canada.json 2.15 MB GeoJSON coordinates Number-heavy (float arrays)
twitter.json 0.60 MB Twitter API timeline Unicode strings, nested objects
citm_catalog.json 1.65 MB Cinema IT Management catalog Mixed content, real-world API
large.json 7.88 MB Custom: 100K rows tabular JTON primary benchmark

Competing Libraries Referenced

Speed-Focused JSON Libraries

Library GitHub Speed Notes
orjson GitHub 586 MB/s dumps Rust-based; JSON only
ujson GitHub ~300 MB/s C-based
pysimdjson GitHub 1–2 GB/s parse Python bindings for simdjson
simdjson GitHub 2–3 GB/s C++; architecture inspiration for JTON
yapic.json GitHub ~2–3× stdlib Python/C extension

Token Efficiency Formats

Format GitHub Token Savings Approach JSON-Compatible
JTON Zen Grid This repo 11–50% Column headers once ✅ Yes
TRON GitHub ~32% Class-based aliases ❌ No
TOON GitHub ~19% Table-oriented ❌ No

Development

Build from Source

# Install prerequisites
pip install maturin

# Debug build (fast compilation)
maturin develop

# Release build (optimized, recommended for benchmarking)
maturin develop --release

Windows + Python 3.13: Set $env:PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1 before building.

Project Structure

src/
├── jton/                        # Python package (pip install jton)
│   ├── __init__.py              # Public API: loads, dumps, encode, decode, token_count
│   ├── __init__.pyi             # Type stubs (mypy/pyright)
│   ├── cli.py                   # jton CLI entry point
│   └── py.typed                 # PEP 561 marker
└── jton_core/                   # Rust implementation
    └── src/
        ├── lib.rs               # PyO3 module: loads(), dumps(), format_hint()
        ├── serializer.rs        # Zen Grid + JSON serializer, AVX-512 escape path
        ├── types/               # StructuralIndex, FieldDescriptor
        ├── simd/                # AVX2 / AVX-512 structural scanners
        └── parser/              # SIMD indexed parser, string_cache, number parsing

tests/
├── test_json_compatibility.py   # JSON spec conformance
├── test_zen_grid.py             # Zen Grid encoding/decoding + CLI
└── test_reference_vectors.py    # JSONTestSuite corpus (600+ vectors)

benchmarks/
├── run_all_benchmarks.py        # Token efficiency benchmark (8 formats × 6 datasets)
└── results/token_efficiency.md  # Latest benchmark results

Language Support

JTON officially supports Python only.

The SIMD-accelerated parser (AVX2/AVX-512 structural scanning, VPSHUFB nibble classifier, thread-local string cache) is a PyO3 native extension. The performance advantage is inseparable from the Python binding. The format spec is in SPEC.md for anyone who wants to implement JTON in another language.

Language Status Install
Python 3.11+ ✅ Official pip install jton
All others Implement from SPEC.md

Requirements

Requirement Version
Python 3.11+
Rust 1.70+
CPU AVX2 (2013+ Intel/AMD)
CPU (optional) AVX-512 for 2× SIMD throughput

Safety

  • Depth guard: MAX_NESTING_DEPTH = 100 — prevents stack overflow from deeply nested input
  • Arity tolerance: Extra table columns are silently dropped; missing columns are filled with null
  • Memory safety: All unsafe Rust code is in clearly marked blocks using PyO3 FFI patterns
  • No allocation on GIL drop: JTON never releases the GIL mid-parse, avoiding data races

CI / Publishing

Three GitHub Actions workflows are included:

Workflow File Trigger Purpose
CI .github/workflows/ci.yml Push / PR Build + test on Linux, Windows, macOS × Python 3.10–3.13
Release .github/workflows/release.yml git push --tags v* Build manylinux/macOS/Windows wheels, publish to PyPI via OIDC, draft GitHub Release
Security Audit .github/workflows/audit.yml Weekly (Mon 08:00 UTC) cargo audit against RustSec advisory database

Publishing to PyPI

The release workflow uses PyPI Trusted Publishing (OIDC) — no API token needed:

  1. Go to pypi.org/manage/account/publishing
  2. Add a new publisher:
    • Owner: your GitHub username
    • Repository: JTON
    • Workflow: release.yml
    • Environment: pypi
  3. Push a version tag to trigger the release:
    git tag v0.2.0
    git push --tags
    
  4. The workflow builds wheels for all platforms, publishes to PyPI, and creates a draft GitHub Release with wheel files attached.

License

MIT — see NOTICE for full text.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jton-1.0.0.tar.gz (2.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jton-1.0.0-cp313-cp313-win_amd64.whl (400.0 kB view details)

Uploaded CPython 3.13Windows x86-64

File details

Details for the file jton-1.0.0.tar.gz.

File metadata

  • Download URL: jton-1.0.0.tar.gz
  • Upload date:
  • Size: 2.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.0

File hashes

Hashes for jton-1.0.0.tar.gz
Algorithm Hash digest
SHA256 a43aa13a49e13af81f0edbebb699ba66989caa5f78a122ca11c450189acc7f04
MD5 ad22fa56c842f27383239386e83d04a7
BLAKE2b-256 e98fcdacaaf31abfa313d6a2adb0e1c7be2728f181955027f878849276791336

See more details on using hashes here.

File details

Details for the file jton-1.0.0-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: jton-1.0.0-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 400.0 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.0

File hashes

Hashes for jton-1.0.0-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 a0df7fdade7d20d425f20c2865912110744a1639f57fd8d116b1591eba5afe86
MD5 88ae37f4b6a83a61931362b245200e29
BLAKE2b-256 0d887a310f9846dc4a01daf81d94d152dd5d7bd836da5e46c1567bb51da9a407

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page