Compress JSON-shaped documents for LLM prompts with normalization, CLI, and token measurement

These details have not been verified by PyPI

Project links

Project description

jtoken

jtoken compresses JSON-shaped documents for LLM prompts: same information, fewer tokens, lossless round-trip for supported scalar dicts.

It is a small Python library and CLI for turning verbose JSON into a compact line-oriented representation, measuring token savings, and working with real-world document dialects such as Elasticsearch hits and MongoDB JSON.

Why use jtoken

Lower prompt cost: strip JSON punctuation and collapse repeated true, false, and null fields into summary lines.
Readable for models: the output stays human-readable key-value text instead of dense JSON.
Lossless for supported data: nested dicts round-trip through encode() and decode().
Production-shaped inputs: normalize Elasticsearch hits, MongoDB Extended JSON, and Mongo shell literals before encoding.
No required runtime dependencies: the core package is stdlib-only; tiktoken is optional for accurate token counts.

Installation

pip install jtoken
pip install "jtoken[tiktoken]"

Python 3.8+ is supported.

Quick start

import jtoken

data = {
    "user": "alice",
    "age": 30,
    "premium": True,
    "verified": True,
    "is_remote": False,
    "trial": False,
    "score": 9.5,
    "referral": None,
    "last_login": None,
}

text = jtoken.encode(data)
restored = jtoken.decode(text)
assert restored == data

dumps and loads are aliases for encode and decode.

Format overview

JSON example

{"name": "Alice", "age": 30, "active": true, "verified": false, "ref": null}

jtoken example

name: Alice
age: 30
trues: active
falses: verified
nulls: ref

Encoding rules

Nested dicts are flattened with dot notation.
Boolean true values are collected into a trues: line.
Boolean false values are collected into a falses: line.
null values are collected into a nulls: line.
Ambiguous strings such as "90210" or "true" keep quotes so types survive decode.

Supported scalar types

str, int, float, bool, None, and nested dict.

Current limitations

Keys cannot contain . or the separator ": ".
Reserved top-level keys: nulls, trues, falses.
Lists are not encoded directly by the core codec; they are normalized into nested dicts with numeric keys before encoding.

Normalization and denormalization

Use normalization when the source document is not already a plain JSON object of scalar values.

Supported input dialects:

`source`	Use when
`auto`	Let jtoken detect the input family
`json`	Standard JSON
`python`	JSON-compatible Python values
`mongo_extended`	Extended JSON wrappers such as `$oid` and `$date`
`mongo_shell`	Shell literals such as `ObjectId(...)` and `ISODate(...)`
`elastic_hit`	Full Elasticsearch hit with `_source` and `fields`
`elastic_source`	A document shaped like `_source` only

Supported output dialects:

`target`	Result
`python`	Python data structures
`json`	Standard JSON text
`mongo_extended`	Extended JSON wrappers restored from context
`mongo_shell`	Shell-style literals restored from context
`elastic_hit`	Elasticsearch hit envelope restored from context
`elastic_source`	`_source` document only

Sidecar context

Mongo shell types, Elasticsearch envelopes, and list positions are stored in a separate normalization context. Keep that sidecar with the encoded text when you need a lossless decode back into the original dialect.

import jtoken

raw_hit = {...}
normalized, context = jtoken.normalize(raw_hit, source="elastic_hit")
text = jtoken.encode(normalized)
restored = jtoken.denormalize(
    jtoken.decode(text),
    target="elastic_hit",
    context=context,
)

Convenience helpers:

text, context = jtoken.encode_document(raw_hit, source="elastic_hit")
restored = jtoken.decode_document(text, target="elastic_hit", context=context)

CLI

jtoken encode --input-format elastic_hit -f hit.json --context-out hit.ctx.json
jtoken decode --output-format mongo_shell -f hit.jtoken --context-in hit.ctx.json
jtoken stats --input-format json -f document.json
jtoken count --input-format json -f document.json --backend estimate

Common flags:

-f/--file: read from a file instead of stdin
--input-format: document dialect for encode, stats, and count
--output-format: document dialect for decode
--context-out / --context-in: normalization sidecar files
--model and --backend: token counting options for stats and count

Token measurement

stats = jtoken.token_savings(data)
print(stats)
# jtoken: 22 tokens | json: 36 tokens | saved: 14 (38.9%)

count = jtoken.count_tokens(data, backend="estimate")

Backends:

backend	behavior
`auto`	use `tiktoken` when installed, otherwise estimate
`tiktoken`	require `tiktoken`
`estimate`	use a simple character heuristic

Install accurate counting with:

pip install "jtoken[tiktoken]"

API surface

Core codec:

encode(data: dict) -> str
decode(text: str) -> dict

Normalization:

parse_input(text, *, source="auto")
normalize(data, *, source="auto", context=None)
denormalize(data, *, target="python", context)
render_output(value, *, target="python")
encode_document(raw, *, source="auto", context=None)
decode_document(text, *, target="python", context)

Token helpers:

count_tokens(data, *, model="cl100k_base", backend="auto")
token_savings(data, *, model="cl100k_base", backend="auto")

Exceptions

JPackError
├── JPackEncodeError
├── JPackDecodeError
├── NormalizationError
├── DenormalizationError
└── TokenCountError

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.4

May 14, 2026

0.3.3

May 12, 2026

0.3.1

May 12, 2026

0.3.0

May 11, 2026

0.2.4

May 11, 2026

0.2.3

May 11, 2026

0.2.2

May 11, 2026

0.2.1

May 11, 2026

This version

0.2.0

May 11, 2026

0.1.0

May 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jtoken-0.2.0.tar.gz (19.7 kB view details)

Uploaded May 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

jtoken-0.2.0-py3-none-any.whl (15.7 kB view details)

Uploaded May 11, 2026 Python 3

File details

Details for the file jtoken-0.2.0.tar.gz.

File metadata

Download URL: jtoken-0.2.0.tar.gz
Upload date: May 11, 2026
Size: 19.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for jtoken-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`b88cf76363d69eb4a1fda55f34a6b0f4f46be1def188d02c2c35245b5a4b1e34`
MD5	`c51729a369cd56b5104cc6077cb024cf`
BLAKE2b-256	`882199ffcbfed11139ced532ace548a9cd9616e67bf6663d15c0e118406f70e8`

See more details on using hashes here.

File details

Details for the file jtoken-0.2.0-py3-none-any.whl.

File metadata

Download URL: jtoken-0.2.0-py3-none-any.whl
Upload date: May 11, 2026
Size: 15.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for jtoken-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4edc49fc8de79b1ecf64457d36339d8c8f344271bcd813fbe45fb205db7be7bc`
MD5	`3c5d185b1e3d3a8428131f4f0cafed53`
BLAKE2b-256	`91d59d58d13bf6bfe0b6e083686336c2b1ed8ed685c0b63fd822f265ac7cab92`

See more details on using hashes here.

jtoken 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

jtoken

Why use jtoken

Installation

Quick start

Format overview

Encoding rules

Supported scalar types

Current limitations

Normalization and denormalization

Sidecar context

CLI

Token measurement

API surface

Exceptions

Links

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes