ULMEN: The number one serialization format across size, tokens, speed, and memory
Project description
ULMEN: Ultra Lightweight Minimal Encoding Notation
The serialization engine built for LLM agentic workflows.
Copyright (c) El Mehdi Makroumi. All rights reserved. Proprietary and confidential.
The AI engineering community is currently obsessed with expanding LLM context windows to 1M+ tokens. Meanwhile, teams are burning massive amounts of compute and cloud egress costs because they are orchestrating multi-agent systems using JSON.
We are feeding state-of-the-art intelligence through a 20-year-old, heavily bloated web format.
ULMEN is a drop-in Python/Rust serialization engine that treats the LLM context window and network IPC as strict hardware constraints. By natively incorporating exact token-counting, string pooling, and semantic validation at the C/Rust boundary, ULMEN delivers Protobuf-level density without requiring pre-compiled schemas.
Table of Contents
- Benchmarks
- At a Glance
- Surfaces
- Installation
- Quick Start
- API Reference
- Wire Format Constants
- Utilities
- Architecture
- Running Tests
- Format Specification
- Versioning
Benchmarks
Benchmarks run on production-grade constraints (NVIDIA Tesla T4, 16GB VRAM):
-
44% LLM Token Reduction: Eliminates syntax bloat, saving approximately $59,000 per 10 million agent loops (vs. GPT-4o input costs).
-
3x Faster Reads: Deserializes heavily nested payloads natively faster than the C-optimized orjson and standard json.
-
4.1x Smaller IPC Footprint: The pooled binary format drastically reduces microservice network egress and Redis cache saturation.
-
The Semantic Firewall: Unlike generic formats that silently pass broken traces, the ULMEN-AGENT protocol automatically rejects orphaned tools, backwards steps, and invalid enums before they trigger LLM hallucinations.
Surfaces
ULMEN exposes four surfaces over a single data model:
Binary: LUMB prefix
Columnar binary format. Smallest on wire. Designed for storage and IPC. Supports delta encoding, bitpacking, RLE, string pooling, and zlib.
Text: records[N]: prefix
Line-oriented, diff-friendly, human-readable. Compatible with standard text tools. Uses the same pool and strategy system as binary.
ULMEN: L| prefix
LLM-native CSV surface. Every payload is self-describing via a typed header line. Language models can read and generate ULMEN without special training or prompt engineering.
Streaming: UlmenStreamEncoder / stream_encode
Zero-materialisation streaming encode surface. Feed records one at a time
or in batches, then flush to an iterator of bytes chunks. The Rust backend
is selected automatically. Wire format is identical to batch binary encode —
every chunk is independently decodable. For truly unbounded streams use
stream_encode_windowed which encodes fixed-size windows into independent
sub-payloads, each decodable standalone.
ULMEN-AGENT: ULMEN-AGENT v1 prefix
Structured protocol for agentic AI communication. Typed record schemas for messages, tool calls, results, plans, observations, errors, memory, RAG chunks, hypotheses, and chain-of-thought steps.
Extended capabilities:
- Extended header fields: payload_id, parent_payload_id, agent_id, session_id, schema_version, context_window, context_used, meta_fields
- Meta fields appended to every row: parent_id, from_agent, to_agent, priority
- Context compression: completed_sequences, keep_types, sliding_window
- Priority-based retention: MUST_KEEP, KEEP_IF_ROOM, COMPRESSIBLE
- Unlimited context via chunk_payload, merge_chunks, build_summary_chain
- LLM output auto-repair via parse_llm_output
- Exact BPE token counting via count_tokens_exact (cl100k_base)
- Multi-agent routing via AgentRouter
- Cross-payload thread tracking via ThreadRegistry
- Append-only audit trail via ReplayLog
- Programmatic system prompt generation via generate_system_prompt
- ULMEN bridge: convert_agent_to_ulmen, convert_ulmen_to_agent
- Structured validation errors via ValidationError
- Context budget enforcement via ContextBudgetExceededError
- Streaming decode via decode_agent_stream
- Subgraph extraction by thread, step range, type
- Memory deduplication via dedup_mem, get_latest_mem
- MessagePack compatibility via encode_msgpack, decode_msgpack
Installation
From source (with Rust acceleration)
git clone https://github.com/makroumi/ulmen
cd ulmen
pip install maturin
maturin develop --release
Python only (no Rust required)
pip install -e .
The library detects automatically whether the Rust extension is available and falls back to the pure Python implementation silently.
Quick Start
from ulmen import UlmenDict, UlmenDictRust, encode_ulmen_llm, decode_ulmen_llm
records = [
{"id": 1, "name": "Alice", "city": "London", "score": 98.5, "active": True},
{"id": 2, "name": "Bob", "city": "London", "score": 91.0, "active": False},
{"id": 3, "name": "Carol", "city": "Paris", "score": 87.3, "active": True},
]
# Binary (smallest)
ld = UlmenDict(records)
binary = ld.encode_binary_pooled()
zlib_ = ld.encode_binary_zlib()
# Text (human-readable)
text = ld.encode_text()
# ULMEN (LLM-native)
ulmen = encode_ulmen_llm(records)
back = decode_ulmen_llm(ulmen)
# Rust acceleration (drop-in, byte-identical)
ld_rust = UlmenDictRust(records)
binary = ld_rust.encode_binary_pooled()
text = ld_rust.encode_text()
ulmen = ld_rust.encode_ulmen_llm()
ULMEN-AGENT
from ulmen import (
encode_agent_payload,
decode_agent_payload,
decode_agent_payload_full,
validate_agent_payload,
compress_context,
chunk_payload,
merge_chunks,
build_summary_chain,
parse_llm_output,
count_tokens_exact,
AgentRouter,
ThreadRegistry,
ReplayLog,
generate_system_prompt,
convert_agent_to_ulmen,
convert_ulmen_to_agent,
dedup_mem,
get_latest_mem,
estimate_context_usage,
extract_subgraph,
extract_subgraph_payload,
make_validation_error,
AgentHeader,
ValidationError,
ContextBudgetExceededError,
)
records = [
{
"type": "msg", "id": "m1", "thread_id": "t1", "step": 1,
"role": "user", "turn": 1, "content": "Hello", "tokens": 5,
"flagged": False,
},
{
"type": "tool", "id": "tc1", "thread_id": "t1", "step": 2,
"name": "search", "args": '{"q":"ulmen"}', "status": "pending",
},
{
"type": "res", "id": "tc1", "thread_id": "t1", "step": 3,
"name": "search", "data": "ULMEN is fast", "status": "done",
"latency_ms": 42,
},
]
# Encode with extended header fields
payload = encode_agent_payload(
records,
thread_id="t1",
context_window=8000,
payload_id="uuid-abc",
parent_payload_id="uuid-prev",
agent_id="agent-alpha",
session_id="sess-001",
schema_version="1.0.0",
auto_context=True,
auto_payload_id=False,
enforce_budget=False,
)
# Decode (records only)
decoded = decode_agent_payload(payload)
# Decode (records + parsed header)
records_out, header = decode_agent_payload_full(payload)
print(header.payload_id)
print(header.context_used)
# Validate
ok, err = validate_agent_payload(payload)
# Validate with structured error object
ok, err = validate_agent_payload(payload, structured=True)
if not ok:
print(err.message, err.row, err.field, err.suggestion)
# Stream decode one record at a time
from ulmen import decode_agent_stream
for rec in decode_agent_stream(iter(payload.splitlines(keepends=True))):
print(rec["type"])
# Context compression
from ulmen.core._agent import COMPRESS_COMPLETED_SEQUENCES
compressed = compress_context(
records,
strategy=COMPRESS_COMPLETED_SEQUENCES,
preserve_cot=True,
)
# Memory deduplication
clean = dedup_mem(records)
latest = get_latest_mem(records, key="user_pref")
# Context usage estimation
usage = estimate_context_usage(records)
print(usage["tokens"], usage["by_type"])
# Chunking for unlimited context
chunks = chunk_payload(records, token_budget=2000, thread_id="t1", overlap=1)
merged = merge_chunks(chunks)
# Summary chain for unlimited context
chain = build_summary_chain(records, token_budget=2000, thread_id="t1")
# LLM output auto-repair
repaired = parse_llm_output(raw_llm_text)
repaired = parse_llm_output(raw_llm_text, strict=True)
# Exact token counting
n_tokens = count_tokens_exact(payload)
# Subgraph extraction
filtered = extract_subgraph(records, thread_id="t1", step_min=2, types=["tool","res"])
filtered_payload = extract_subgraph_payload(payload, types=["cot"])
# Multi-agent routing
router = AgentRouter()
router.register("planner", "executor", lambda rec: print(rec))
router.dispatch(records)
# Cross-payload thread tracking
registry = ThreadRegistry()
registry.add_payload("pid-1", records)
# Audit trail
log = ReplayLog()
log.append({"event": "encode", "payload_id": "pid-1"})
# System prompt generation
prompt = generate_system_prompt(include_examples=True, include_validation=True)
# ULMEN bridge
ulmen = convert_agent_to_ulmen(payload)
payload2 = convert_ulmen_to_agent(ulmen, thread_id="t1")
# Validation error payload
err_payload = make_validation_error("bad step", thread_id="t1")
# Context budget enforcement
try:
encode_agent_payload(records, context_window=10, enforce_budget=True)
except ContextBudgetExceededError as e:
print(e.overage)
API Reference
UlmenDict
Pure Python record container. Zero runtime dependencies.
ld = UlmenDict(records)
ld.encode_text() # str ULMEN text format
ld.encode_binary() # bytes raw binary
ld.encode_binary_pooled() # bytes binary with full strategy selection
ld.encode_binary_zlib(level=6) # bytes binary + zlib, level 0-9
ld.encode_ulmen_llm() # str ULMEN format
ld.decode_text(text) # UlmenDict
ld.decode_binary(data) # UlmenDict
ld.decode_ulmen_llm(text) # UlmenDict
ld.to_json() # str standard JSON (NaN/inf replaced with null)
ld.append(record) # mutate, rebuilds pool, invalidates cache
len(ld) # number of records
ld.pool_size # number of interned strings
ld[0] # direct index access
UlmenDictRust
Extended pool variant. Strategies always enabled.
ldf = UlmenDictFull(records, pool_size_limit=256)
ldf.encode_binary()
ldf.encode_text()
ldf.encode_ulmen_llm()
UlmenDictRust / UlmenDictFullRust
Rust-accelerated drop-in replacements. Byte-identical output.
from ulmen import UlmenDictRust, UlmenDictFullRust, RUST_AVAILABLE
print(RUST_AVAILABLE)
ld = UlmenDictRust(records, optimizations=False, pool_size_limit=64)
ld.encode_text()
ld.encode_binary_pooled()
ld.encode_binary_zlib(level=6)
ld.encode_ulmen_llm()
Streaming encode
See ulmen.core._streaming for full API.
from ulmen import UlmenStreamEncoder, stream_encode, stream_encode_windowed
# One-shot
for chunk in stream_encode(records, chunk_size=65536):
socket.sendall(chunk)
# Stateful
enc = UlmenStreamEncoder(pool_size_limit=64, chunk_size=65536)
enc.feed(record)
enc.feed_many(records)
for chunk in enc.flush():
sink.write(chunk)
print(enc.rust_backed) # True when Rust extension active
# Unbounded windowed
for chunk in stream_encode_windowed(records, window_size=1000):
decode_binary_records(chunk)
Model-level encode/decode
from ulmen import (
encode_ulmen_llm,
decode_ulmen_llm,
encode_binary_records,
decode_binary_records,
encode_text_records,
decode_text_records,
build_pool,
detect_column_strategy,
)
ULMEN-AGENT core
from ulmen import (
encode_agent_payload,
decode_agent_payload,
decode_agent_payload_full,
decode_agent_record,
encode_agent_record,
decode_agent_stream,
validate_agent_payload,
make_validation_error,
extract_subgraph,
extract_subgraph_payload,
AgentHeader,
ValidationError,
ContextBudgetExceededError,
)
'encode_agent_payload' parameters:
| Parameter | Type | Description |
|---|---|---|
| records | list[dict] | Records to encode |
| thread_id | str or None | Written to header |
| context_window | int or None | Token budget declared in header |
| meta_fields | tuple | Extra fields appended to every row |
| auto_context | bool | Compute context_used automatically |
| enforce_budget | bool | Raise ContextBudgetExceededError if over budget |
| payload_id | str or None | Unique ID for this payload |
| parent_payload_id | str or None | Links to prior payload in chain |
| agent_id | str or None | ID of the producing agent |
| session_id | str or None | Session this payload belongs to |
| schema_version | str or None | Protocol version for negotiation |
| auto_payload_id | bool | Generate a UUID payload_id automatically |
Context compression
from ulmen import compress_context, dedup_mem, get_latest_mem, estimate_context_usage
from ulmen.core._agent import (
COMPRESS_COMPLETED_SEQUENCES,
COMPRESS_KEEP_TYPES,
COMPRESS_SLIDING_WINDOW,
PRIORITY_MUST_KEEP,
PRIORITY_KEEP_IF_ROOM,
PRIORITY_COMPRESSIBLE,
)
compressed = compress_context(
records,
strategy=COMPRESS_COMPLETED_SEQUENCES,
keep_priority=PRIORITY_KEEP_IF_ROOM,
preserve_cot=True,
)
clean = dedup_mem(records)
latest = get_latest_mem(records, key="pref")
usage = estimate_context_usage(records)
Strategies:
- completed_sequences: replace completed tool+res pairs with mem summaries
- keep_types: keep only specified record types
- sliding_window: keep recent records verbatim, summarize older ones
Unlimited context
from ulmen import chunk_payload, merge_chunks, build_summary_chain
chunks = chunk_payload(
records,
token_budget=4000,
thread_id="t1",
overlap=2,
parent_payload_id="prev-id",
session_id="sess-1",
)
merged = merge_chunks(chunks)
chain = build_summary_chain(
records,
token_budget=4000,
thread_id="t1",
session_id="sess-1",
)
LLM output repair
from ulmen import parse_llm_output
repaired = parse_llm_output(raw_text)
repaired = parse_llm_output(raw_text, thread_id="t1", strict=True)
Uses cl100k_base BPE (GPT-4 / Claude compatible). Falls back to character estimate when tiktoken is unavailable.
Multi-agent routing
from ulmen import AgentRouter, validate_routing_consistency
router = AgentRouter()
router.register("agent_a", "agent_b", handler_fn)
router.dispatch(records)
router.dispatch_one(record)
ok, err = validate_routing_consistency(records)
Cross-payload thread tracking
from ulmen import ThreadRegistry, merge_threads
registry = ThreadRegistry()
registry.add_payload("pid-1", records)
threads = registry.get_threads()
merged = merge_threads([payload1_records, payload2_records])
Audit trail
from ulmen import ReplayLog
log = ReplayLog()
log.append({"event": "encode", "ts": 1234})
events = log.all()
System prompt generation
from ulmen import generate_system_prompt
prompt = generate_system_prompt(
include_examples=True,
include_validation=True,
)
ULMEN bridge
from ulmen import convert_agent_to_ulmen, convert_ulmen_to_agent
ulmen = convert_agent_to_ulmen(agent_payload)
payload = convert_ulmen_to_agent(ulmen, thread_id="t1")
MessagePack compatibility
from ulmen.core._msgpack_compat import encode_msgpack, decode_msgpack
packed = encode_msgpack(records)
unpacked = decode_msgpack(packed)
Wire Format Constants
from ulmen import (
MAGIC, # b'LUMB'
VERSION, # bytes([3, 3])
T_STR_TINY, T_STR, T_INT, T_FLOAT, T_BOOL, T_NULL,
T_LIST, T_MAP, T_POOL_DEF, T_POOL_REF, T_MATRIX,
T_DELTA_RAW, T_BITS, T_RLE,
S_RAW, S_DELTA, S_RLE, S_BITS, S_POOL,
AGENT_MAGIC, # "ULMEN-AGENT v1"
AGENT_VERSION, # "1.0.0"
RECORD_TYPES, # frozenset of 10 type tags
FIELD_COUNTS, # dict[type -> total field count per row including common fields]
META_FIELDS, # ("parent_id", "from_agent", "to_agent", "priority")
COMPRESS_COMPLETED_SEQUENCES,
COMPRESS_KEEP_TYPES,
COMPRESS_SLIDING_WINDOW,
PRIORITY_MUST_KEEP, # 1
PRIORITY_KEEP_IF_ROOM, # 2
PRIORITY_COMPRESSIBLE, # 3
)
Utilities
from ulmen import (
estimate_tokens, # rough LLM token count (chars / 4)
deep_size, # recursive memory footprint in bytes
deep_eq, # structural equality handling NaN and inf
fnv1a, fnv1a_str, # FNV-1a 32-bit hash
)
Architecture
ulmen/
├── Cargo.lock
├── Cargo.toml
├── pyproject.toml
├── README.md
├── SPEC.md
├── src/
│ └── lib.rs
├── ulmen/
│ ├── __init__.py
│ ├── core.py
│ └── core/
│ ├── __init__.py
│ ├── _constants.py
│ ├── _primitives.py
│ ├── _strategies.py
│ ├── _text.py
│ ├── _binary.py
│ ├── _ulmen_llm.py
│ ├── _agent.py
│ ├── _api.py
│ ├── _repair.py
│ ├── _replay.py
│ ├── _routing.py
│ ├── _threading.py
│ ├── _tokens.py
│ ├── _msgpack_compat.py
│ └── _streaming.py
├── tests/
│ ├── conftest.py
│ ├── smoke_test_comprehensive.py
│ ├── integration/
│ │ ├── test_edge_cases.py
│ │ ├── test_init_coverage.py
│ │ └── test_rust_layer.py
│ ├── perf/
│ │ ├── test_benchmark.py
│ │ ├── test_size.py
│ │ └── test_speed.py
│ └── unit/
│ ├── test_agent.py
│ ├── test_core_coverage.py
│ ├── test_encoders.py
│ ├── test_ulmendict.py
│ ├── test_ulmen_llm.py
│ ├── test_msgpack_compat.py
│ ├── test_primitives.py
│ ├── test_repair.py
│ ├── test_replay.py
│ ├── test_routing.py
│ ├── test_strategies.py
│ ├── test_streaming.py
│ ├── test_threading.py
│ └── test_tokens.py
└── docs/
├── index.md
├── getting-started/
│ ├── installation.md
│ └── quickstart.md
├── guides/
│ ├── binary-format.md
│ ├── text-format.md
│ ├── ulmen.md
│ └── compression.md
├── reference/
│ ├── api.md
│ ├── constants.md
│ ├── primitives.md
│ └── benchmarks.md
├── agent/
│ ├── overview.md
│ ├── spec.md
│ └── system-prompt.md
└── internals/
├── architecture.md
└── wire-format.md
Design principle: the Python layer is the normative specification. The Rust layer is an optimization producing identical output at higher speed. All encode results are cached after the first call and invalidated on mutation.
Running Tests
pytest tests/ -v
pytest tests/ --cov=ulmen --cov-report=term-missing
1,364 tests across unit, integration, performance, and smoke suites. 100% statement coverage across all modules. All tests pass with and without the Rust extension.
Format Specification
See SPEC.md for the complete wire format specification including all tag values, encoding rules, strategy selection logic, and full ULMEN and ULMEN-AGENT protocol details.
Versioning
1.0.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ulmen-1.0.2.tar.gz.
File metadata
- Download URL: ulmen-1.0.2.tar.gz
- Upload date:
- Size: 80.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
be80318de658db8af2dd2abb73cfbd74a6d612ef55c47fcd9a93cadff5f0153a
|
|
| MD5 |
5ec8fed1743bf2c53f6e2b60b8854bc9
|
|
| BLAKE2b-256 |
6c1d5c427d12dad2b6ac9a00be13cf730bbc8039b52d9227195ce9463c02495b
|
Provenance
The following attestation bundles were made for ulmen-1.0.2.tar.gz:
Publisher:
ci.yml on makroumi/ulmen
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ulmen-1.0.2.tar.gz -
Subject digest:
be80318de658db8af2dd2abb73cfbd74a6d612ef55c47fcd9a93cadff5f0153a - Sigstore transparency entry: 1304616591
- Sigstore integration time:
-
Permalink:
makroumi/ulmen@cf301e8d56e5fdcf7301771428f28499bfaf10d9 -
Branch / Tag:
refs/tags/v1.0.2 - Owner: https://github.com/makroumi
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@cf301e8d56e5fdcf7301771428f28499bfaf10d9 -
Trigger Event:
push
-
Statement type:
File details
Details for the file ulmen-1.0.2-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: ulmen-1.0.2-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 401.9 kB
- Tags: CPython 3.10+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
486cd26dc7b3a0e8690711bd986f9810881f08091eec5c546b16179c47817ceb
|
|
| MD5 |
df496131af32c37661c4d7b6251dc19e
|
|
| BLAKE2b-256 |
4e2511ed2b223c92c64d99196f0ed4a30c52f866d4dd9b40cdf73da147270dd7
|
Provenance
The following attestation bundles were made for ulmen-1.0.2-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
ci.yml on makroumi/ulmen
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ulmen-1.0.2-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
486cd26dc7b3a0e8690711bd986f9810881f08091eec5c546b16179c47817ceb - Sigstore transparency entry: 1304616707
- Sigstore integration time:
-
Permalink:
makroumi/ulmen@cf301e8d56e5fdcf7301771428f28499bfaf10d9 -
Branch / Tag:
refs/tags/v1.0.2 - Owner: https://github.com/makroumi
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@cf301e8d56e5fdcf7301771428f28499bfaf10d9 -
Trigger Event:
push
-
Statement type: