Skip to main content

Bidirectional UUID<->numeric ID mapping to shrink LLM prompt token counts for large item lists.

Project description

llm-id-compressor

PyPI version CI License: MIT

Shrinks LLM prompt/response token counts when you're sending a list of UUID-keyed items (transcript lines, database rows, log entries) to a model. Swaps each UUID for a short numeric ID before the request, and restores the original UUIDs from the model's response afterward.

Install

pip install llm-id-compressor

Usage

from llm_id_compressor import create_id_mapping, replace_uuids_with_nums, restore_uuids_in_response

items = [{"id": "550e8400-e29b-41d4-a716-446655440000", "text": "..."}, ...]

mapping = create_id_mapping(items)
compact_items = replace_uuids_with_nums(items, mapping["uuid_to_num"])

# send compact_items to the LLM, get back a list of {"id": "<num>", ...} lines
response_lines = call_llm(compact_items)

restored = restore_uuids_in_response(response_lines, mapping["num_to_uuid"])

replace_uuids_with_nums expects each item to have exactly id and text keys - any other fields are dropped, and an item missing text raises a KeyError. restore_uuids_in_response has no such restriction: it preserves every field on each response line, only rewriting id.

restore_uuids_in_response silently drops any line whose numeric id wasn't in the original mapping (a hallucinated id) rather than raising - callers that insert results by a global primary key would otherwise have one bad line abort the whole batch.

Why this exists

Sending a list of UUID-keyed items to an LLM burns tokens twice over: once in the prompt, once in the response, on 36-character identifiers the model never actually reasons about. There's no other package on PyPI doing this specific swap - it's a five-minute fix once you know to look for it, but easy to miss.

Contributing

Issues and PRs welcome - see CONTRIBUTING.md.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_id_compressor-0.1.0.tar.gz (4.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_id_compressor-0.1.0-py3-none-any.whl (4.6 kB view details)

Uploaded Python 3

File details

Details for the file llm_id_compressor-0.1.0.tar.gz.

File metadata

  • Download URL: llm_id_compressor-0.1.0.tar.gz
  • Upload date:
  • Size: 4.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for llm_id_compressor-0.1.0.tar.gz
Algorithm Hash digest
SHA256 81b5ecf83b1b9598cc1ff10c44db7f5d969b8373fa5b43b66f150b1db07ed154
MD5 98061d141657a89a96d703753b697a6b
BLAKE2b-256 93650486a74b4263240cca33edcc89deca4f31b0636e81ec189f31121a9dc15b

See more details on using hashes here.

File details

Details for the file llm_id_compressor-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llm_id_compressor-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0f82862f2d6a5d6e979ef4e18a101e65b511128040f852074687d7b36272b38c
MD5 fb2b33b055ffba083752e43ec56a7c0a
BLAKE2b-256 7445e06e67c8ae3de238a8c202fcdaa544345b334de2950340d462cccab1a34e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page