Compress JSON-shaped documents for LLM prompts with normalization, CLI, and token measurement

These details have not been verified by PyPI

Project links

Project description

jtoken

Author: Hermann Samimi

jtoken compresses JSON-shaped documents for LLM prompts: fewer tokens, readable line-oriented output, and lossless round-trip for supported scalar nested dicts. It includes normalization for Elasticsearch hits and MongoDB JSON, a CLI, and token measurement helpers.

Python 3.8+.

Installation

Core (no extra runtime dependencies)

pip install jtoken

With accurate OpenAI-style token counting

pip install "jtoken[tiktoken]"

The core package uses only the Python standard library. Install the tiktoken extra when you want tokenizer-accurate counts for OpenAI-compatible models.

Quick start

import jtoken

data = {
    "user": "alice",
    "age": 30,
    "premium": True,
    "verified": True,
    "is_remote": False,
    "trial": False,
    "score": 9.5,
    "referral": None,
    "last_login": None,
}

text = jtoken.encode(data)
restored = jtoken.decode(text)
assert restored == data

Aliases: jtoken.dumps = encode, jtoken.loads = decode.

End-to-end document workflow

import jtoken

raw = open("hit.json", encoding="utf-8").read()
text, context = jtoken.encode_document(raw, source="elastic_hit")
restored = jtoken.decode_document(text, target="elastic_hit", context=context)

Keep the normalization context sidecar when you need a lossless decode back into Mongo shell, Extended JSON, or an Elasticsearch hit envelope.

Format overview

JSON

{"name": "Alice", "age": 30, "active": true, "verified": false, "ref": null}

jtoken

name: Alice
age: 30
trues: active
falses: verified
nulls: ref

Encoding rules

Nested dicts flatten with dot notation.
True, False, and None collapse into trues:, falses:, and nulls: summary lines.
Ambiguous strings keep quotes on encode.
Multiline strings are JSON-quoted on one line.
Keys containing . are escaped during normalization and restored from context.

Supported scalar types

str, int, float, bool, None, and nested dict.

Limitations

Keys cannot contain ": " in the core codec.
Reserved top-level keys: nulls, trues, falses.
Lists are normalized into nested dicts with numeric keys before encoding.

Public API reference

Package metadata

name	type	description
`jtoken.__version__`	`str`	package version
`jtoken.__author__`	`str`	author name (`Hermann Samimi`)

Core codec

function	signature	description
`encode`	`encode(data: dict) -> str`	compress a nested scalar dict into jtoken text
`decode`	`decode(text: str) -> dict`	reconstruct the nested dict
`dumps`	alias of `encode`	json-style alias
`loads`	alias of `decode`	json-style alias

Normalization and denormalization

function	signature	description
`parse_input`	`parse_input(text, *, source="auto")`	parse foreign text into Python data
`normalize`	`normalize(data, *, source="auto", context=None)`	return `(normalized_dict, NormalizationContext)`
`denormalize`	`denormalize(data, *, target="python", context)`	restore lists, typed values, and dialect shape
`render_output`	`render_output(value, *, target="python") -> str`	render denormalized data as text
`encode_document`	`encode_document(raw, *, source="auto", context=None)`	return `(jtoken_text, NormalizationContext)`
`decode_document`	`decode_document(text, *, target="python", context)`	decode jtoken text and denormalize

Token measurement

function	signature	description
`count_tokens`	`count_tokens(data, *, model="cl100k_base", backend="auto") -> int`	count tokens for a dict or encoded jtoken string
`count_text_tokens`	`count_text_tokens(text, *, model="cl100k_base", backend="auto") -> int`	count tokens for raw text
`token_savings`	`token_savings(data, *, model="cl100k_base", backend="auto", json_indent=2)`	compare jtoken vs pretty JSON token usage

`TokenSavings` properties

property	type	description
`jtoken_tokens`	`int`	token count for the jtoken representation
`json_tokens`	`int`	token count for the JSON baseline
`saved`	`int`	`json_tokens - jtoken_tokens`
`percent`	`float`	percent saved relative to JSON

str(stats) prints a one-line summary.

`NormalizationContext` fields

field	type	description
`source_format`	`str`	input dialect used during normalization
`target_format`	`str \| None`	optional output hint
`typed_values`	`dict[str, str]`	dotted paths with BSON-like type markers
`lists`	`set[str]`	dotted paths that were lists before flattening
`dotted_keys`	`dict[str, str]`	escaped keys that originally contained `.`
`elastic`	`dict \| None`	Elasticsearch envelope metadata

Methods: to_dict(), from_dict(data).

Format enums

InputFormat: auto, json, python, mongo_extended, mongo_shell, elastic_hit, elastic_source

OutputFormat: python, json, mongo_extended, mongo_shell, elastic_hit, elastic_source

Exceptions

exception	base	when raised
`JPackError`	`Exception`	base library error
`JPackEncodeError`	`JPackError`	encoding fails
`JPackDecodeError`	`JPackError`	decoding fails
`NormalizationError`	`JPackError`	normalization fails
`DenormalizationError`	`JPackError`	denormalization fails
`TokenCountError`	`JPackError`	token counting fails

Token counting

stats = jtoken.token_savings(data, model="gpt-4o", backend="tiktoken", json_indent=2)
print(stats.jtoken_tokens, stats.json_tokens, stats.saved, stats.percent)

`backend`	behavior
`auto`	use `tiktoken` when installed, otherwise estimate
`tiktoken`	require `tiktoken`
`estimate`	simple character heuristic

json_indent=2 compares against prompt-style pretty JSON. Use json_indent=None for compact JSON.

CLI

jtoken encode --input-format mongo_shell -f doc.json --context-out doc.ctx.json
jtoken decode --output-format mongo_shell -f doc.jtoken --context-in doc.ctx.json
jtoken stats --input-format json -f doc.json --model gpt-4o --backend tiktoken
jtoken count --input-format json -f doc.json --backend estimate
python -m jtoken encode

Common flags:

-f/--file
--input-format
--output-format
--context-out
--context-in
--model
--backend

License

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.4

May 14, 2026

0.3.3

May 12, 2026

0.3.1

May 12, 2026

0.3.0

May 11, 2026

0.2.4

May 11, 2026

0.2.3

May 11, 2026

0.2.2

May 11, 2026

This version

0.2.1

May 11, 2026

0.2.0

May 11, 2026

0.1.0

May 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jtoken-0.2.1.tar.gz (19.7 kB view details)

Uploaded May 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

jtoken-0.2.1-py3-none-any.whl (16.4 kB view details)

Uploaded May 11, 2026 Python 3

File details

Details for the file jtoken-0.2.1.tar.gz.

File metadata

Download URL: jtoken-0.2.1.tar.gz
Upload date: May 11, 2026
Size: 19.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for jtoken-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`ad9025ebcfe9d563a17473256ab70f929b9f117fd5233614224ddbde058c327d`
MD5	`ef5b9da872cc416e1ae9ebad3c78ce35`
BLAKE2b-256	`f990a9e5d006317794420a1faed5a46fd8be5393598c775062d482548f84a0cd`

See more details on using hashes here.

File details

Details for the file jtoken-0.2.1-py3-none-any.whl.

File metadata

Download URL: jtoken-0.2.1-py3-none-any.whl
Upload date: May 11, 2026
Size: 16.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for jtoken-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`381897b67f12ef6e160a6242c92203ad534bad40f02cde5d7acb585ee0b4b650`
MD5	`9ca0e4a41119d94a3cb66efa521d90aa`
BLAKE2b-256	`34f0ba96a4c5692e9c467eaa7d1f19014e3fec99b167e1281cb7b40abe01674f`

See more details on using hashes here.

jtoken 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

jtoken

Installation

Core (no extra runtime dependencies)

With accurate OpenAI-style token counting

Quick start

End-to-end document workflow

Format overview

Encoding rules

Supported scalar types

Limitations

Public API reference

Package metadata

Core codec

Normalization and denormalization

Token measurement

TokenSavings properties

NormalizationContext fields

Format enums

Exceptions

Token counting

CLI

Links

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`TokenSavings` properties

`NormalizationContext` fields