Skip to main content

Merlin dedup integration for LangChain - strip byte-redundant context before it reaches the LLM.

Project description

merlin-langchain

Drop-in MerlinBufferMemory for LangChain. Strips redundant text from your chat history before it reaches the LLM, so multi-turn agents stop choking on context-window overflow.

  • Real-world demo: a coding agent fed two real lock files (facebook/react/yarn.lock + vercel/next.js/pnpm-lock.yaml, ~2 MB / 1 M tokens per turn) crashes vanilla LangChain on turn 2 with Gemini's 400 INVALID_ARGUMENT "exceeds 1048576". With MerlinBufferMemory the same agent survives 6 turns and the same Gemini call returns 200 OK (receipts in docs/benchmarks/langchain_2026-05-14.pdf).

Quick start (3 minutes)

1 - Install the Python package

pip install merlin-langchain

2 - Get the Merlin binary

The Python package only contains the LangChain glue. The dedup engine itself ships as a small native binary, downloaded once.

Place the binary anywhere you like. Most users put it in ~/.merlin/:

mkdir -p ~/.merlin
mv merlin-lite-windows-x64.exe ~/.merlin/merlin.exe

3 - Tell the package where the binary lives

# Windows PowerShell
$env:MERLIN_BINARY = "$HOME\.merlin\merlin.exe"

# bash / zsh
export MERLIN_BINARY=~/.merlin/merlin

If you skip this step, the package looks in ~/.merlin/merlin[.exe] by default. If the binary still isn't found, MerlinBufferMemory transparently falls back to vanilla LangChain - no crash, just no optimization.

4 - Use it

from merlin_langchain import MerlinBufferMemory
from langchain.chains import ConversationChain
from langchain_openai import ChatOpenAI

memory = MerlinBufferMemory(memory_key="chat_history")
chain = ConversationChain(llm=ChatOpenAI(model="gpt-4o"), memory=memory)
chain.invoke({"input": "..."})

That's it. Your agent now silently dedupes its rolling chat history before each LLM call. No code changes elsewhere.


What you get

Component Drop-in replacement for
MerlinBufferMemory langchain.memory.ConversationBufferMemory
merlin_format_log_to_str langchain.agents.format_scratchpad.format_log_to_str

Both inherit / mirror the LangChain interfaces, so they pass Pydantic validation in Chain.memory slots and work in any chain that accepts BaseMemory.

Async surface (aload_memory_variables, asave_context, aclear) is implemented for use behind LangServe / FastAPI / await agent.ainvoke().


Limits (community tier)

The community binary processes up to:

Window Cap
Per call 50 MB
Per day 200 MB
Per month 2 GB

A single solo developer never hits these. A serious commercial pipeline hits them in 2-3 days; for higher caps see https://corbenic.ai.

What happens when a cap is reached

MerlinBufferMemory transparently falls back to vanilla LangChain behavior. Your prompts pass through unchanged - exactly as if the package weren't installed - and your LLM call proceeds normally.

  • You'll see one WARNING in your logs the first time fallback kicks in.
  • The package will automatically retry the binary every hour (configurable via the MERLIN_RETRY_AFTER_S environment variable, minimum 60 seconds).
  • When the cap rolls over (daily at 00:00 UTC, monthly on the 1st), the next retry succeeds and you'll see INFO: Merlin dedup recovered.

This means you cannot get stuck in a degraded state because of a forgotten reset - long-running web servers self-heal across midnight UTC without restart.


Configuration

Variable Default Purpose
MERLIN_BINARY ~/.merlin/merlin[.exe] Path to the binary
MERLIN_RETRY_AFTER_S 3600 Seconds to skip dedup after a cap-hit before re-probing. Min 60.

Constructor parameters on MerlinBufferMemory:

Param Default Purpose
memory_key "history" Key under which the rendered string is returned
keep_tail_lines 2 Trailing lines preserved verbatim (the most-recent context)
human_prefix / ai_prefix "Human" / "AI" Standard LangChain prefixes
return_messages False If True, returns the message list instead of a string (no dedup applied; mirror of CBM behavior)
extra_env None Optional env-var dict for the binary subprocess (advanced)

When MerlinBufferMemory helps - and when it doesn't

Helps: multi-turn agents that re-feed tool outputs into the prompt each turn (ReAct, Cline, AutoGPT, Devin-style workflows). Anywhere the chat history accumulates large repeated content (lock files, terminal logs, file dumps, retrieved documents).

Doesn't help: single-shot LLM calls with no rolling history. Tiny prompts under a few KB. Workloads where every turn introduces only fresh unique content.

When it doesn't help, you don't pay for it - the dedup just shrinks the prompt by zero bytes.


License

MIT. See LICENSE.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

merlin_langchain-0.1.0.tar.gz (17.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

merlin_langchain-0.1.0-py3-none-any.whl (12.7 kB view details)

Uploaded Python 3

File details

Details for the file merlin_langchain-0.1.0.tar.gz.

File metadata

  • Download URL: merlin_langchain-0.1.0.tar.gz
  • Upload date:
  • Size: 17.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for merlin_langchain-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e53ee0d80913c5d4f2c62f89491b07455fa0aadddbfa77ba89921d3be2af5830
MD5 1820e8f44cb409caead182a4bdf3475e
BLAKE2b-256 628964a77a87be2ed05653b643753618b7ee7748c8b167d14f3214d9e4eb0c93

See more details on using hashes here.

File details

Details for the file merlin_langchain-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for merlin_langchain-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d473b67c07f6675d362067969b6211db774958a82a15e8e764a9278cbebc7d8a
MD5 bfb776aaaef92334a89a059377dbd695
BLAKE2b-256 f2458df44cd9b08dc62e9a037c6196510b7b785fca6db35070791d5eb65c5e10

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page