A lightweight, human-readable key-value serialization format
Project description
jtoken
Compress JSON for LLM prompts — same data, fewer tokens.
What it does
jtoken strips the syntactic noise from JSON (", {}, ,) and collapses all
null, true, and false fields each into a single summary line. Nested dicts
are flattened with dot notation so the same collapse applies at every level.
The result is a compact format an LLM reads just as well as JSON.
JSON (30 tokens):
{"name": "Alice", "age": 30, "active": true, "verified": false, "ref": null}
jtoken (21 tokens):
name: Alice
age: 30
trues: active
falses: verified
nulls: ref
The round-trip is lossless: decode(encode(data)) == data for all supported types.
Installation
# Core — no external dependencies
pip install jtoken
# With accurate LLM token counting
pip install jtoken[tiktoken]
Quick start
import jtoken
data = {
"user": "alice",
"age": 30,
"premium": True,
"verified": True,
"is_remote": False,
"trial": False,
"score": 9.5,
"referral": None,
"last_login": None,
}
text = jtoken.encode(data)
# user: alice
# age: 30
# score: 9.5
# trues: premium,verified
# falses: is_remote,trial
# nulls: referral,last_login
original = jtoken.decode(text)
assert original == data
dumps / loads are available as json-style aliases.
CLI
echo '{"name": "Alice", "active": true}' | jtoken encode
echo 'name: Alice\ntrues: active' | jtoken decode
echo '{"name": "Alice", "active": true}' | jtoken stats
echo '{"name": "Alice", "active": true}' | jtoken count
Use -f/--file to read from a file instead of stdin. stats and count accept
--model and --backend (auto, tiktoken, estimate).
Nested documents
Nested dicts are flattened with dot notation. Booleans and nulls at any depth are collapsed into the same summary lines.
data = {
"title": "Engineer",
"metadata": {
"verified": True,
"sponsored": False,
"score": None,
"source": {
"crawled": True,
"enriched": None,
},
},
}
print(jtoken.encode(data))
# title: Engineer
# trues: metadata.verified,metadata.source.crawled
# falses: metadata.sponsored
# nulls: metadata.score,metadata.source.enriched
Decode reconstructs the full nested structure:
assert jtoken.decode(jtoken.encode(data)) == data # ✓
Limitation: keys cannot contain . (reserved for nesting) or ": ".
Arrays are not supported.
Token savings
import jtoken
stats = jtoken.token_savings(data)
print(stats)
# jtoken: 22 tokens | json: 36 tokens | saved: 14 (38.9%)
n = jtoken.count_tokens(data) # count jtoken tokens only
Savings are compared against json.dumps(data) — the standard representation
you'd paste into a prompt. Savings are highest when a document has many null
or boolean fields.
# Specify model or encoding
stats = jtoken.token_savings(data, model="gpt-4o")
stats = jtoken.token_savings(data, model="o200k_base")
# No tiktoken dependency
stats = jtoken.token_savings(data, backend="estimate")
API
encode(data: dict) -> str
Compresses a dict into jtoken. Supported value types: str, int, float,
bool, None, nested dict.
Summary lines (always at the end):
| line | contains |
|---|---|
trues: k1,k2,... |
all keys whose value is True |
falses: k1,k2,... |
all keys whose value is False |
nulls: k1,k2,... |
all keys whose value is None |
String values that would decode ambiguously (look like a number or boolean) keep their quotes:
jtoken.encode({"zip": "90210"}) # → 'zip: "90210"' (string, quotes kept)
jtoken.encode({"zip": 90210}) # → 'zip: 90210' (int, no quotes)
jtoken.encode({"ok": "true"}) # → 'ok: "true"' (string, quotes kept)
jtoken.encode({"ok": True}) # → 'trues: ok' (bool, collapsed)
Raises JPackEncodeError for unsupported types, dots or ": " in keys, or
reserved key names (nulls, trues, falses).
decode(text: str) -> dict
Reconstructs the original dict, including nested structure from dot-notation keys. Type inference for scalar values:
| value | decoded as |
|---|---|
"quoted" |
str (always) |
key in trues: line |
True |
key in falses: line |
False |
key in nulls: line |
None |
integer literal, e.g. 42 |
int |
float literal, e.g. 3.14 |
float |
| anything else | str |
Raises JPackDecodeError for invalid input.
token_savings(data, *, model, backend) -> TokenSavings
Compares jtoken vs json.dumps token usage.
stats.jtoken_tokens # int
stats.json_tokens # int
stats.saved # int
stats.percent # float
str(stats) # "jtoken: 22 tokens | json: 36 tokens | saved: 14 (38.9%)"
count_tokens(data, *, model, backend) -> int
Counts LLM tokens in the jtoken representation. Accepts a dict or an already-encoded jtoken string.
backend options:
| value | behaviour |
|---|---|
"auto" (default) |
tiktoken if installed, otherwise estimates |
"tiktoken" |
requires tiktoken; raises TokenCountError if absent |
"estimate" |
~4 chars/token heuristic, no extra dependency |
Exceptions
JPackError
├── JPackEncodeError
├── JPackDecodeError
└── TokenCountError
Development
git clone https://github.com/hermannsamimi/jtoken
cd jtoken
pip install -e ".[dev]"
pytest
pytest --cov=jtoken --cov-report=term-missing
License
MIT — © 2026 Hermann Samimi
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file jtoken-0.1.0.tar.gz.
File metadata
- Download URL: jtoken-0.1.0.tar.gz
- Upload date:
- Size: 11.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9a3c7f3671c00ccaba605281872b579e2c6292cf07e1151b40c3f06d890787f2
|
|
| MD5 |
0bab17fcb31fefc0f38d0f9c66e14ab4
|
|
| BLAKE2b-256 |
9ef4b580107ea4d4ecb3c52184ea5278229bc835a7783720868607653dcaa636
|
File details
Details for the file jtoken-0.1.0-py3-none-any.whl.
File metadata
- Download URL: jtoken-0.1.0-py3-none-any.whl
- Upload date:
- Size: 10.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4bbafbd6bbae1076c8f1899b70f706955cd5fa35abb5500bba0f7559fd38582d
|
|
| MD5 |
3ae79b2492787ca3e13350d33425b373
|
|
| BLAKE2b-256 |
323bdaceb02e218b95c3bf95ab73fcf108cb9ac11a29119dc7feb6fd367d14da
|