BabelTele: model-native, high-density text compression for LangChain pipelines and agents.
Project description
langchain-babeltele
Model-native, high-density text compression for LangChain pipelines and agents.
BabelTele compresses verbose text into a dense form that people can't read but LLMs can recover. It relaxes the readability prior, using omnilingual word choice and symbolic collapse, then lets downstream models consume the dense text directly with no decompression step. The paper reports ~28% of the original token length at ~99.5% downstream QA fidelity, with no fine-tuning and pure black-box API access.
"Q3 revenue is projected to rise ~30% YoY; if it lands, the Berlin team
ships the mobile app in October."
→ Q3rev📈~30%YoY ∧ ?✅⇒Berlin🚀📱@Oct
Based on "Large Language Models Do Not Always Need Readable Language" (arXiv:2606.19857).
Install
pip install langchain-babeltele
You also need a chat-model provider, e.g. pip install langchain-anthropic.
Examples
If you learn faster from running code, start in examples/. The hello.py script is the smallest end-to-end demo run: compress one paragraph and print the before/after with token counts.
cp examples/default.env examples/.env # and add your API key
python examples/hello.py
From there, examples/ builds up through prompt strategies, the fidelity guardrail, agent-history compression, and long-term memory. See the examples README for the full list.
The core engine
Everything composes from one primitive. Pass any chat model or a model string
(resolved via init_chat_model).
from langchain_babeltele import BabelTeleCompressor
compressor = BabelTeleCompressor("anthropic:claude-sonnet-4-6")
result = compressor.compress(long_text)
print(result.text) # the dense BabelTele representation
print(result.retention_ratio) # e.g. 0.28
# Use it anywhere in an LCEL chain:
chain = compressor.as_runnable() | some_reader_model
Long inputs that exceed the compressor's own context window are chunked automatically:
BabelTeleCompressor(model, chunk_tokens=200_000)
Where it plugs in
RAG: compress retrieved documents
from langchain.retrievers import ContextualCompressionRetriever
from langchain_babeltele import BabelTeleDocumentCompressor
retriever = ContextualCompressionRetriever(
base_compressor=BabelTeleDocumentCompressor(compressor=compressor),
base_retriever=base_retriever,
)
Agents: compress history and tool outputs
A denser drop-in alternative to SummarizationMiddleware. Folds overflowing
history into one dense message and compresses large tool outputs before they
re-enter context.
from langchain.agents import create_agent
from langchain_babeltele import BabelTeleCompressionMiddleware
agent = create_agent(
model="anthropic:claude-sonnet-4-6",
tools=tools,
middleware=[
BabelTeleCompressionMiddleware(
compressor,
token_budget=4000,
keep_last_n=2,
tool_output_threshold=2000,
)
],
)
Long-term memory
The paper's LoCoMo recipe: compress each session, embed, retrieve top-k.
from langchain_babeltele import BabelTeleMemoryStore
memory = BabelTeleMemoryStore(vector_store, compressor)
memory.add_session(conversation_text)
relevant = memory.retrieve("what did we decide about pricing?", k=4)
Choosing a prompt strategy
BabelTele offers several prompt strategies rather than one fixed prompt. Select a built-in strategy or pass your own:
from langchain_babeltele import BabelTeleStrategy
BabelTeleCompressor(model, strategy=BabelTeleStrategy.BT_P8) # fixed symbolic rules
BabelTeleCompressor(model, strategy="my custom compression prompt: ")
Fidelity guardrail
Because BabelTele abandons readability, a faulty compression can silently drop information. The guardrail scores recoverability with an LLM judge and retries with milder structured strategies. If it can't ensure fidelity, it falls back to the original text.
from langchain_babeltele import FidelityGuardrail
compressor = BabelTeleCompressor(
model,
guardrail=FidelityGuardrail("anthropic:claude-sonnet-4-6", threshold=0.8),
)
result = compressor.compress(text)
print(result.verified) # True / False
Benchmarks
On document QA, BabelTele kept 99.5% semantic fidelity while compressing text to 27.9% of its original length.
Agent memory (LoCoMo). Compressing each session before storing it retains most
of the full-text accuracy at roughly half the tokens, and edges out plain
summarization. This is what BabelTeleMemoryStore does.
| Method | Tokens / query | Accuracy | vs. original |
|---|---|---|---|
| Original | 2819.5 | 64.81 | 100.0% |
| Summary | 1365.6 | 61.05 | 94.2% |
| BabelTele | 1382.2 | 62.53 | 96.5% |
Absolute scores reflect LoCoMo's difficulty (even full context scores only 64.81); the point is relative retention. BabelTele preserves 96.5% of baseline accuracy at roughly half the tokens, and beats summarization while doing it.
Multi-agent communication. Compressing inter-agent messages cut tokens sharply with little score loss.
| Setting | Token reduction | Score (vs. uncompressed) |
|---|---|---|
| Homogeneous (Gemini with Gemini) | 38.96% | 96.6% |
| Heterogeneous (Gemini with GPT) | 44.21% | 99.7% |
Beyond the context window. When the input exceeds the window, chunked BabelTele
compression beat naive truncation on LongBench v2 Code Repo QA (Long). This is the
chunk_tokens path.
| Reader | Truncation | BabelTele |
|---|---|---|
| Qwen3.6-Max | 55.17 | 62.07 |
| GLM-5.1 | 62.07 | 72.41 |
| Kimi 2.5 | 44.82 | 48.28 |
Compression strength varies by model. On LongBench v2, Gemini 3.1 Pro was the most aggressive at over 95% compression (about 4% retention), while GPT-5.4 was the most conservative at roughly 75% (about 27% retention); other models landed in between. Portability is a separate axis: in cross-model tests, GPT- and Claude-compressed inputs were the most portable for other readers to decode, while Qwen- and Kimi-compressed inputs caused larger accuracy drops.
All numbers are from the paper (arXiv:2606.19857). Pull requests are welcome to develop evals for this project and generate more results.
Pro Tip
Figure out which strategy works best for your model and use-case. Different combinations may give surprisingly different answers.
No one config that works for all.
Development
pip install -e ".[dev]"
pytest
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file langchain_babeltele-0.1.0.tar.gz.
File metadata
- Download URL: langchain_babeltele-0.1.0.tar.gz
- Upload date:
- Size: 62.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1e9bf9801876a644adb60d469569d8dc2e205e425a7a3f101431352e790bce7c
|
|
| MD5 |
cda3567ce2bd6a4b7b48913268dd1457
|
|
| BLAKE2b-256 |
a2319b43dfd560a89f0bcb0c2e84bc83fe60f257c5a84c550265dbff2b8d70f4
|
File details
Details for the file langchain_babeltele-0.1.0-py3-none-any.whl.
File metadata
- Download URL: langchain_babeltele-0.1.0-py3-none-any.whl
- Upload date:
- Size: 20.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
42092948b28e3ea424ee2211a00192c85eba981a03acda36af2383c4bd8f3117
|
|
| MD5 |
ef73bf4873fe9838d999f15b4b616947
|
|
| BLAKE2b-256 |
b9f07af8adbc77f778c61a3c05c3630240f3808dd85e4ec3f02b5ad075880144
|