Skip to main content

Compress JSON-shaped documents for LLM prompts with normalization, CLI, and token measurement

Project description

jtoken

jtoken compresses JSON-shaped documents for LLM prompts: same information, fewer tokens, lossless round-trip for supported scalar dicts.

It is a small Python library and CLI for turning verbose JSON into a compact line-oriented representation, measuring token savings, and working with real-world document dialects such as Elasticsearch hits and MongoDB JSON.

Why use jtoken

  • Lower prompt cost: strip JSON punctuation and collapse repeated true, false, and null fields into summary lines.
  • Readable for models: the output stays human-readable key-value text instead of dense JSON.
  • Lossless for supported data: nested dicts round-trip through encode() and decode().
  • Production-shaped inputs: normalize Elasticsearch hits, MongoDB Extended JSON, and Mongo shell literals before encoding.
  • No required runtime dependencies: the core package is stdlib-only; tiktoken is optional for accurate token counts.

Installation

pip install jtoken
pip install "jtoken[tiktoken]"

Python 3.8+ is supported.

Quick start

import jtoken

data = {
    "user": "alice",
    "age": 30,
    "premium": True,
    "verified": True,
    "is_remote": False,
    "trial": False,
    "score": 9.5,
    "referral": None,
    "last_login": None,
}

text = jtoken.encode(data)
restored = jtoken.decode(text)
assert restored == data

dumps and loads are aliases for encode and decode.

Format overview

JSON example

{"name": "Alice", "age": 30, "active": true, "verified": false, "ref": null}

jtoken example

name: Alice
age: 30
trues: active
falses: verified
nulls: ref

Encoding rules

  • Nested dicts are flattened with dot notation.
  • Boolean true values are collected into a trues: line.
  • Boolean false values are collected into a falses: line.
  • null values are collected into a nulls: line.
  • Ambiguous strings such as "90210" or "true" keep quotes so types survive decode.

Supported scalar types

str, int, float, bool, None, and nested dict.

Current limitations

  • Keys cannot contain . or the separator ": ".
  • Reserved top-level keys: nulls, trues, falses.
  • Lists are not encoded directly by the core codec; they are normalized into nested dicts with numeric keys before encoding.

Normalization and denormalization

Use normalization when the source document is not already a plain JSON object of scalar values.

Supported input dialects:

source Use when
auto Let jtoken detect the input family
json Standard JSON
python JSON-compatible Python values
mongo_extended Extended JSON wrappers such as $oid and $date
mongo_shell Shell literals such as ObjectId(...) and ISODate(...)
elastic_hit Full Elasticsearch hit with _source and fields
elastic_source A document shaped like _source only

Supported output dialects:

target Result
python Python data structures
json Standard JSON text
mongo_extended Extended JSON wrappers restored from context
mongo_shell Shell-style literals restored from context
elastic_hit Elasticsearch hit envelope restored from context
elastic_source _source document only

Sidecar context

Mongo shell types, Elasticsearch envelopes, and list positions are stored in a separate normalization context. Keep that sidecar with the encoded text when you need a lossless decode back into the original dialect.

import jtoken

raw_hit = {...}
normalized, context = jtoken.normalize(raw_hit, source="elastic_hit")
text = jtoken.encode(normalized)
restored = jtoken.denormalize(
    jtoken.decode(text),
    target="elastic_hit",
    context=context,
)

Convenience helpers:

text, context = jtoken.encode_document(raw_hit, source="elastic_hit")
restored = jtoken.decode_document(text, target="elastic_hit", context=context)

CLI

jtoken encode --input-format elastic_hit -f hit.json --context-out hit.ctx.json
jtoken decode --output-format mongo_shell -f hit.jtoken --context-in hit.ctx.json
jtoken stats --input-format json -f document.json
jtoken count --input-format json -f document.json --backend estimate

Common flags:

  • -f/--file: read from a file instead of stdin
  • --input-format: document dialect for encode, stats, and count
  • --output-format: document dialect for decode
  • --context-out / --context-in: normalization sidecar files
  • --model and --backend: token counting options for stats and count

Token measurement

stats = jtoken.token_savings(data)
print(stats)
# jtoken: 22 tokens | json: 36 tokens | saved: 14 (38.9%)

count = jtoken.count_tokens(data, backend="estimate")

Backends:

backend behavior
auto use tiktoken when installed, otherwise estimate
tiktoken require tiktoken
estimate use a simple character heuristic

Install accurate counting with:

pip install "jtoken[tiktoken]"

API surface

Core codec:

  • encode(data: dict) -> str
  • decode(text: str) -> dict

Normalization:

  • parse_input(text, *, source="auto")
  • normalize(data, *, source="auto", context=None)
  • denormalize(data, *, target="python", context)
  • render_output(value, *, target="python")
  • encode_document(raw, *, source="auto", context=None)
  • decode_document(text, *, target="python", context)

Token helpers:

  • count_tokens(data, *, model="cl100k_base", backend="auto")
  • token_savings(data, *, model="cl100k_base", backend="auto")

Exceptions

JPackError
├── JPackEncodeError
├── JPackDecodeError
├── NormalizationError
├── DenormalizationError
└── TokenCountError

Links

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jtoken-0.2.0.tar.gz (19.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jtoken-0.2.0-py3-none-any.whl (15.7 kB view details)

Uploaded Python 3

File details

Details for the file jtoken-0.2.0.tar.gz.

File metadata

  • Download URL: jtoken-0.2.0.tar.gz
  • Upload date:
  • Size: 19.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for jtoken-0.2.0.tar.gz
Algorithm Hash digest
SHA256 b88cf76363d69eb4a1fda55f34a6b0f4f46be1def188d02c2c35245b5a4b1e34
MD5 c51729a369cd56b5104cc6077cb024cf
BLAKE2b-256 882199ffcbfed11139ced532ace548a9cd9616e67bf6663d15c0e118406f70e8

See more details on using hashes here.

File details

Details for the file jtoken-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: jtoken-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 15.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for jtoken-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4edc49fc8de79b1ecf64457d36339d8c8f344271bcd813fbe45fb205db7be7bc
MD5 3c5d185b1e3d3a8428131f4f0cafed53
BLAKE2b-256 91d59d58d13bf6bfe0b6e083686336c2b1ed8ed685c0b63fd822f265ac7cab92

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page