Compress JSON-shaped documents for LLM prompts with normalization, CLI, and token measurement
Project description
jtoken
jtoken compresses JSON-shaped documents for LLM prompts: same information, fewer tokens, lossless round-trip for supported scalar dicts.
It is a small Python library and CLI for turning verbose JSON into a compact line-oriented representation, measuring token savings, and working with real-world document dialects such as Elasticsearch hits and MongoDB JSON.
Why use jtoken
- Lower prompt cost: strip JSON punctuation and collapse repeated
true,false, andnullfields into summary lines. - Readable for models: the output stays human-readable key-value text instead of dense JSON.
- Lossless for supported data: nested dicts round-trip through
encode()anddecode(). - Production-shaped inputs: normalize Elasticsearch hits, MongoDB Extended JSON, and Mongo shell literals before encoding.
- No required runtime dependencies: the core package is stdlib-only;
tiktokenis optional for accurate token counts.
Installation
pip install jtoken
pip install "jtoken[tiktoken]"
Python 3.8+ is supported.
Quick start
import jtoken
data = {
"user": "alice",
"age": 30,
"premium": True,
"verified": True,
"is_remote": False,
"trial": False,
"score": 9.5,
"referral": None,
"last_login": None,
}
text = jtoken.encode(data)
restored = jtoken.decode(text)
assert restored == data
dumps and loads are aliases for encode and decode.
Format overview
JSON example
{"name": "Alice", "age": 30, "active": true, "verified": false, "ref": null}
jtoken example
name: Alice
age: 30
trues: active
falses: verified
nulls: ref
Encoding rules
- Nested dicts are flattened with dot notation.
- Boolean
truevalues are collected into atrues:line. - Boolean
falsevalues are collected into afalses:line. nullvalues are collected into anulls:line.- Ambiguous strings such as
"90210"or"true"keep quotes so types survive decode.
Supported scalar types
str, int, float, bool, None, and nested dict.
Current limitations
- Keys cannot contain
.or the separator": ". - Reserved top-level keys:
nulls,trues,falses. - Lists are not encoded directly by the core codec; they are normalized into nested dicts with numeric keys before encoding.
Normalization and denormalization
Use normalization when the source document is not already a plain JSON object of scalar values.
Supported input dialects:
source |
Use when |
|---|---|
auto |
Let jtoken detect the input family |
json |
Standard JSON |
python |
JSON-compatible Python values |
mongo_extended |
Extended JSON wrappers such as $oid and $date |
mongo_shell |
Shell literals such as ObjectId(...) and ISODate(...) |
elastic_hit |
Full Elasticsearch hit with _source and fields |
elastic_source |
A document shaped like _source only |
Supported output dialects:
target |
Result |
|---|---|
python |
Python data structures |
json |
Standard JSON text |
mongo_extended |
Extended JSON wrappers restored from context |
mongo_shell |
Shell-style literals restored from context |
elastic_hit |
Elasticsearch hit envelope restored from context |
elastic_source |
_source document only |
Sidecar context
Mongo shell types, Elasticsearch envelopes, and list positions are stored in a separate normalization context. Keep that sidecar with the encoded text when you need a lossless decode back into the original dialect.
import jtoken
raw_hit = {...}
normalized, context = jtoken.normalize(raw_hit, source="elastic_hit")
text = jtoken.encode(normalized)
restored = jtoken.denormalize(
jtoken.decode(text),
target="elastic_hit",
context=context,
)
Convenience helpers:
text, context = jtoken.encode_document(raw_hit, source="elastic_hit")
restored = jtoken.decode_document(text, target="elastic_hit", context=context)
CLI
jtoken encode --input-format elastic_hit -f hit.json --context-out hit.ctx.json
jtoken decode --output-format mongo_shell -f hit.jtoken --context-in hit.ctx.json
jtoken stats --input-format json -f document.json
jtoken count --input-format json -f document.json --backend estimate
Common flags:
-f/--file: read from a file instead of stdin--input-format: document dialect forencode,stats, andcount--output-format: document dialect fordecode--context-out/--context-in: normalization sidecar files--modeland--backend: token counting options forstatsandcount
Token measurement
stats = jtoken.token_savings(data)
print(stats)
# jtoken: 22 tokens | json: 36 tokens | saved: 14 (38.9%)
count = jtoken.count_tokens(data, backend="estimate")
Backends:
| backend | behavior |
|---|---|
auto |
use tiktoken when installed, otherwise estimate |
tiktoken |
require tiktoken |
estimate |
use a simple character heuristic |
Install accurate counting with:
pip install "jtoken[tiktoken]"
API surface
Core codec:
encode(data: dict) -> strdecode(text: str) -> dict
Normalization:
parse_input(text, *, source="auto")normalize(data, *, source="auto", context=None)denormalize(data, *, target="python", context)render_output(value, *, target="python")encode_document(raw, *, source="auto", context=None)decode_document(text, *, target="python", context)
Token helpers:
count_tokens(data, *, model="cl100k_base", backend="auto")token_savings(data, *, model="cl100k_base", backend="auto")
Exceptions
JPackError
├── JPackEncodeError
├── JPackDecodeError
├── NormalizationError
├── DenormalizationError
└── TokenCountError
Links
- Homepage: https://github.com/hermannsamimi/jtoken
- Issues: https://github.com/hermannsamimi/jtoken/issues
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file jtoken-0.2.0.tar.gz.
File metadata
- Download URL: jtoken-0.2.0.tar.gz
- Upload date:
- Size: 19.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b88cf76363d69eb4a1fda55f34a6b0f4f46be1def188d02c2c35245b5a4b1e34
|
|
| MD5 |
c51729a369cd56b5104cc6077cb024cf
|
|
| BLAKE2b-256 |
882199ffcbfed11139ced532ace548a9cd9616e67bf6663d15c0e118406f70e8
|
File details
Details for the file jtoken-0.2.0-py3-none-any.whl.
File metadata
- Download URL: jtoken-0.2.0-py3-none-any.whl
- Upload date:
- Size: 15.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4edc49fc8de79b1ecf64457d36339d8c8f344271bcd813fbe45fb205db7be7bc
|
|
| MD5 |
3c5d185b1e3d3a8428131f4f0cafed53
|
|
| BLAKE2b-256 |
91d59d58d13bf6bfe0b6e083686336c2b1ed8ed685c0b63fd822f265ac7cab92
|