Compress JSON-shaped documents for LLM prompts with normalization, CLI, and token measurement

These details have not been verified by PyPI

Project links

Project description

jtoken

Author: Hermann Samimi

jtoken compresses JSON-shaped documents for LLM prompts: fewer tokens, readable line-oriented output, lossless round-trip. Pass a file, a string, or a dict — it figures out the rest.

Python 3.8+. No extra runtime dependencies.

Installation

pip install jtoken
pip install "jtoken[tiktoken]"   # for OpenAI-compatible token counting

Quick start

import jtoken

# From a file — read as text, pass directly
raw = open("data.json").read()
encoded = jtoken.encode(raw)
print(encoded)

# From a Python dict
data = {"user": "alice", "age": 30, "active": True, "ref": None}
encoded = jtoken.encode(data)
decoded = jtoken.decode(encoded)
assert decoded == data

Aliases: jtoken.dumps = encode, jtoken.loads = decode.

Format overview

JSON

{"name": "Alice", "age": 30, "active": true, "verified": false, "ref": null}

jtoken

name: Alice
age: 30
trues: active
falses: verified
nulls: ref

Encoding rules

Nested dicts flatten with dot notation.
True, False, and None collapse into trues:, falses:, and nulls: summary lines.
Ambiguous strings keep quotes on encode.
Keys containing . are escaped during normalization and restored from context.

What jtoken accepts

encode accepts a string (file content) or a dict/list. When given a string, it auto-detects the format:

Standard JSON objects and arrays
Multiple bare JSON objects in a single string (no array wrapper needed)
MongoDB shell format (ObjectId(...), ISODate(...), NumberInt(...))
MongoDB Extended JSON ($oid, $date, $numberInt, …)
Elasticsearch search hits (with _source)

No format flag required — just pass the text.

Normalization and denormalization

For lossless round-trips back into MongoDB shell or Elasticsearch hit format, use encode_document / decode_document:

import jtoken

raw = open("hit.json").read()
text, context = jtoken.encode_document(raw)
restored = jtoken.decode_document(text, target="mongo_shell", context=context)

jtoken encode -f doc.json --context-out doc.ctx.json
jtoken decode --output-format mongo_shell -f doc.jtoken --context-in doc.ctx.json

Input and output formats

auto (the default) handles everything automatically. Override with source= / target= only when needed.

Input format	Description
`auto`	detect from content (default)
`json`	standard JSON
`mongo_shell`	MongoDB shell (`ObjectId`, `ISODate`, …)
`mongo_extended`	MongoDB Extended JSON
`elastic_hit`	Elasticsearch hit with `_source`
`elastic_source`	`_source` wrapper only

Output format	Description
`json`	pretty-printed JSON (CLI default)
`python`	Python `repr` (Python API default)
`mongo_shell`	MongoDB shell document
`mongo_extended`	MongoDB Extended JSON
`elastic_hit`	full Elasticsearch hit envelope
`elastic_source`	`_source` wrapper

Public API reference

Core codec

function	description
`encode(data) -> str`	compress string, dict, or list to jtoken
`decode(text: str) -> dict`	reconstruct the nested dict
`dumps` / `loads`	json-style aliases

Normalization

function	description
`encode_document(raw, *, source="auto", context=None)`	return `(jtoken_text, NormalizationContext)`
`decode_document(text, *, target="json", context=None)`	decode and denormalize
`normalize(data, *, source="auto", context=None)`	return `(normalized_dict, NormalizationContext)`
`denormalize(data, *, target="python", context)`	restore lists, typed values, and dialect
`parse_input(text, *, source="auto")`	parse foreign text into Python data
`render_output(value, *, target="python") -> str`	render denormalized data as text

Token measurement

function	description
`count_tokens(data, *, model, backend) -> int`	token count for dict or jtoken string
`count_text_tokens(text, *, model, backend) -> int`	token count for raw text
`token_savings(data, *, model, backend, json_indent=2)`	compare jtoken vs pretty JSON

`TokenSavings` properties

property	type	description
`jtoken_tokens`	`int`	tokens in jtoken representation
`json_tokens`	`int`	tokens in JSON baseline
`saved`	`int`	`json_tokens - jtoken_tokens`
`percent`	`float`	percent saved

`NormalizationContext` fields

field	description
`source_format`	detected input dialect
`target_format`	optional output hint
`typed_values`	BSON-like type markers per path
`lists`	paths that were lists before flattening
`dotted_keys`	paths with escaped `.` keys
`elastic`	Elasticsearch envelope metadata

Methods: to_dict(), from_dict(data).

Exceptions

exception	when raised
`JPackEncodeError`	encoding fails
`JPackDecodeError`	decoding fails
`NormalizationError`	normalization fails
`DenormalizationError`	denormalization fails
`TokenCountError`	token counting fails

Token counting

stats = jtoken.token_savings(data, model="gpt-4o", backend="tiktoken", json_indent=2)
print(stats.jtoken_tokens, stats.json_tokens, stats.saved, stats.percent)

`backend`	behavior
`auto`	use `tiktoken` when installed, otherwise estimate
`tiktoken`	require `tiktoken`
`estimate`	character heuristic

Representative token counts

Document type	JSON	jtoken
ELK hit	1537	583
Mongo shell	770	508
PostgreSQL structured document	831	685
Standard JSON	617	503

Token count by representation

CLI

cat data.json | jtoken encode
cat data.jtoken | jtoken decode
jtoken encode -f data.json
jtoken stats -f data.json --model gpt-4o --backend tiktoken
jtoken count -f data.json
python -m jtoken encode

License

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.4

May 14, 2026

This version

0.3.3

May 12, 2026

0.3.1

May 12, 2026

0.3.0

May 11, 2026

0.2.4

May 11, 2026

0.2.3

May 11, 2026

0.2.2

May 11, 2026

0.2.1

May 11, 2026

0.2.0

May 11, 2026

0.1.0

May 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jtoken-0.3.3.tar.gz (22.8 kB view details)

Uploaded May 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

jtoken-0.3.3-py3-none-any.whl (18.1 kB view details)

Uploaded May 12, 2026 Python 3

File details

Details for the file jtoken-0.3.3.tar.gz.

File metadata

Download URL: jtoken-0.3.3.tar.gz
Upload date: May 12, 2026
Size: 22.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for jtoken-0.3.3.tar.gz
Algorithm	Hash digest
SHA256	`7538afcfa4e40fbc4ac40c79c450fd1577545daf14c79ed58485448fb77d48fc`
MD5	`299ebb35e690d66bd1517d086df2432e`
BLAKE2b-256	`e8db577da4c7b1c5a1e80012b25253190094fd78d492c631b8d78eb35d28ddd8`

See more details on using hashes here.

File details

Details for the file jtoken-0.3.3-py3-none-any.whl.

File metadata

Download URL: jtoken-0.3.3-py3-none-any.whl
Upload date: May 12, 2026
Size: 18.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for jtoken-0.3.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7453cce77b6e26d1e7f91f432db948ace07db9f6cc20ef42afe721207cd263a3`
MD5	`cceecf3fa0a5542ca873685021fb65dc`
BLAKE2b-256	`7bc89267e36b1eff0d8f239a6e4b0391d237fa3f0465caf142b27b0b060ff1bd`

See more details on using hashes here.

jtoken 0.3.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

jtoken

Installation

Quick start

Format overview

Encoding rules

What jtoken accepts

Normalization and denormalization

Input and output formats

Public API reference

Core codec

Normalization

Token measurement

TokenSavings properties

NormalizationContext fields

Exceptions

Token counting

Representative token counts

CLI

Links

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`TokenSavings` properties

`NormalizationContext` fields