Skip to main content

A lightweight, human-readable key-value serialization format

Project description

jtoken

Compress JSON for LLM prompts — same data, fewer tokens.

What it does

jtoken strips the syntactic noise from JSON (", {}, ,) and collapses all null, true, and false fields each into a single summary line. Nested dicts are flattened with dot notation so the same collapse applies at every level. The result is a compact format an LLM reads just as well as JSON.

JSON (30 tokens):

{"name": "Alice", "age": 30, "active": true, "verified": false, "ref": null}

jtoken (21 tokens):

name: Alice
age: 30
trues: active
falses: verified
nulls: ref

The round-trip is lossless: decode(encode(data)) == data for all supported types.

Installation

# Core — no external dependencies
pip install jtoken

# With accurate LLM token counting
pip install jtoken[tiktoken]

Quick start

import jtoken

data = {
    "user": "alice",
    "age": 30,
    "premium": True,
    "verified": True,
    "is_remote": False,
    "trial": False,
    "score": 9.5,
    "referral": None,
    "last_login": None,
}

text = jtoken.encode(data)
# user: alice
# age: 30
# score: 9.5
# trues: premium,verified
# falses: is_remote,trial
# nulls: referral,last_login

original = jtoken.decode(text)
assert original == data

dumps / loads are available as json-style aliases.

CLI

echo '{"name": "Alice", "active": true}' | jtoken encode
echo 'name: Alice\ntrues: active' | jtoken decode
echo '{"name": "Alice", "active": true}' | jtoken stats
echo '{"name": "Alice", "active": true}' | jtoken count

Use -f/--file to read from a file instead of stdin. stats and count accept --model and --backend (auto, tiktoken, estimate).

Nested documents

Nested dicts are flattened with dot notation. Booleans and nulls at any depth are collapsed into the same summary lines.

data = {
    "title": "Engineer",
    "metadata": {
        "verified": True,
        "sponsored": False,
        "score": None,
        "source": {
            "crawled": True,
            "enriched": None,
        },
    },
}

print(jtoken.encode(data))
# title: Engineer
# trues: metadata.verified,metadata.source.crawled
# falses: metadata.sponsored
# nulls: metadata.score,metadata.source.enriched

Decode reconstructs the full nested structure:

assert jtoken.decode(jtoken.encode(data)) == data  # ✓

Limitation: keys cannot contain . (reserved for nesting) or ": ". Arrays are not supported.

Token savings

import jtoken

stats = jtoken.token_savings(data)
print(stats)
# jtoken: 22 tokens | json: 36 tokens | saved: 14 (38.9%)

n = jtoken.count_tokens(data)  # count jtoken tokens only

Savings are compared against json.dumps(data) — the standard representation you'd paste into a prompt. Savings are highest when a document has many null or boolean fields.

# Specify model or encoding
stats = jtoken.token_savings(data, model="gpt-4o")
stats = jtoken.token_savings(data, model="o200k_base")

# No tiktoken dependency
stats = jtoken.token_savings(data, backend="estimate")

API

encode(data: dict) -> str

Compresses a dict into jtoken. Supported value types: str, int, float, bool, None, nested dict.

Summary lines (always at the end):

line contains
trues: k1,k2,... all keys whose value is True
falses: k1,k2,... all keys whose value is False
nulls: k1,k2,... all keys whose value is None

String values that would decode ambiguously (look like a number or boolean) keep their quotes:

jtoken.encode({"zip": "90210"})  # → 'zip: "90210"'   (string, quotes kept)
jtoken.encode({"zip":  90210})   # → 'zip: 90210'      (int, no quotes)
jtoken.encode({"ok": "true"})    # → 'ok: "true"'      (string, quotes kept)
jtoken.encode({"ok": True})      # → 'trues: ok'       (bool, collapsed)

Raises JPackEncodeError for unsupported types, dots or ": " in keys, or reserved key names (nulls, trues, falses).

decode(text: str) -> dict

Reconstructs the original dict, including nested structure from dot-notation keys. Type inference for scalar values:

value decoded as
"quoted" str (always)
key in trues: line True
key in falses: line False
key in nulls: line None
integer literal, e.g. 42 int
float literal, e.g. 3.14 float
anything else str

Raises JPackDecodeError for invalid input.

token_savings(data, *, model, backend) -> TokenSavings

Compares jtoken vs json.dumps token usage.

stats.jtoken_tokens   # int
stats.json_tokens    # int
stats.saved          # int
stats.percent        # float
str(stats)           # "jtoken: 22 tokens | json: 36 tokens | saved: 14 (38.9%)"

count_tokens(data, *, model, backend) -> int

Counts LLM tokens in the jtoken representation. Accepts a dict or an already-encoded jtoken string.

backend options:

value behaviour
"auto" (default) tiktoken if installed, otherwise estimates
"tiktoken" requires tiktoken; raises TokenCountError if absent
"estimate" ~4 chars/token heuristic, no extra dependency

Exceptions

JPackError
├── JPackEncodeError
├── JPackDecodeError
└── TokenCountError

Development

git clone https://github.com/hermannsamimi/jtoken
cd jtoken
pip install -e ".[dev]"
pytest
pytest --cov=jtoken --cov-report=term-missing

License

MIT — © 2026 Hermann Samimi

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jtoken-0.1.0.tar.gz (11.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jtoken-0.1.0-py3-none-any.whl (10.1 kB view details)

Uploaded Python 3

File details

Details for the file jtoken-0.1.0.tar.gz.

File metadata

  • Download URL: jtoken-0.1.0.tar.gz
  • Upload date:
  • Size: 11.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for jtoken-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9a3c7f3671c00ccaba605281872b579e2c6292cf07e1151b40c3f06d890787f2
MD5 0bab17fcb31fefc0f38d0f9c66e14ab4
BLAKE2b-256 9ef4b580107ea4d4ecb3c52184ea5278229bc835a7783720868607653dcaa636

See more details on using hashes here.

File details

Details for the file jtoken-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: jtoken-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 10.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for jtoken-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4bbafbd6bbae1076c8f1899b70f706955cd5fa35abb5500bba0f7559fd38582d
MD5 3ae79b2492787ca3e13350d33425b373
BLAKE2b-256 323bdaceb02e218b95c3bf95ab73fcf108cb9ac11a29119dc7feb6fd367d14da

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page