Universal, infinite-memory proxy for LLM APIs.
Project description
Stitcher Proxy 🦞
LLMs have amnesia. Stitcher is the cure.
Stitcher Proxy is a universal, infinite-memory proxy for any OpenAI-compatible LLM API.
The Problem
LLMs forget what you told them 10 minutes ago. Their context windows are limited, and clients drop older messages when they hit token limits. Real conversations span days, weeks, or months, but APIs treat every request as an isolated event.
The Solution
Stitcher is a transparent proxy that gives your LLM infinite memory. You send your new messages to Stitcher; Stitcher instantly pieces together the entire history from local JSONL storage, intelligently deduplicates it, fits it into a precise token budget, and forwards it to the upstream API. It "stitches" the context back together.
Quick Start
pip install stitcher-proxy
stitcher-proxy init
stitcher-proxy start
init runs an interactive setup wizard to configure your provider, API key, model, and token budget.
Works With
Stitcher acts as a transparent, infinite-memory drop-in for:
- Claude Code
- OpenClaw
- Codex
- Cursor
- LangChain
- Vercel AI SDK
- Any OpenAI client
CLI Subcommands
Stitcher Proxy includes a full CLI suite for managing your proxy and sessions.
stitcher-proxy init— Run the interactive setup wizard.stitcher-proxy start— Start the proxy.stitcher-proxy status— Show running status, session count, and config summary.stitcher-proxy sessions— List all sessions with message counts and storage sizes.stitcher-proxy sessions purge <name>— Delete a specific session's data.stitcher-proxy config— Print current configuration.stitcher-proxy config set <key> <value>— Update a config value.stitcher-proxy integrate <target>— Show integration guides (e.g.,claude-code,openclaw,codex).
Integration Guides
Stitcher provides built-in integration guides for popular tools. Run stitcher-proxy integrate to see all options.
- Claude Code Integration (
stitcher-proxy integrate claude-code) - OpenClaw Integration (
stitcher-proxy integrate openclaw) - Codex Integration (
stitcher-proxy integrate codex) - Cursor Integration
- Generic/OpenAI Compatible Integration
Global Environment Variable Support
The proxy works globally when set via standard base URL environment variables. Clients will seamlessly route their requests through Stitcher:
export OPENAI_BASE_URL=http://localhost:8081/v1
export ANTHROPIC_BASE_URL=http://localhost:8081/v1
How It Works
[ Client ] ---POST /v1/chat/completions---> [ Stitcher Proxy ]
(Only sends │
new msg) ▼
Reads local JSONL
Stitches history backwards
Deduplicates repetitive text
Enforces token budget (e.g. 128k)
│
▼
[ Upstream API ] <------Full Context------- [ Stitcher Proxy ]
(OpenAI/Anthropic)
Usage Examples
Python (OpenAI SDK)
from openai import OpenAI
# Just change the base_url. That's it.
client = OpenAI(
base_url="http://localhost:8081/v1",
api_key="your-real-api-key", # Passed through to upstream
default_headers={"X-Stitcher-Session": "my-app-user-123"}
)
# Use normally. Stitcher handles infinite memory transparently.
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What did we discuss yesterday?"}]
)
# ^ Even though you only sent 1 message, Stitcher injected
# the full conversation history behind the scenes.
cURL
curl http://localhost:8081/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "X-Stitcher-Session: terminal-session-99" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello again!"}]
}'
Configuration
Config loading priority: CLI flags > Environment Variables > ~/.stitcher/config.json > Defaults.
| CLI Flag | Description | Default |
|---|---|---|
--port |
Port to run the proxy on | 8081 |
--upstream |
Upstream LLM API URL | https://api.openai.com |
--max-tokens |
Token budget for stitched context | 128000 |
--data-dir |
Directory for JSONL storage | ~/.stitcher/sessions |
API Endpoints
- POST
/v1/chat/completions: The main OpenAI-compatible proxy endpoint. Supports both normal requests and SSE streaming (stream: true). PassX-Stitcher-Sessionheader to isolate memory, otherwise the session is derived from the first message. - GET
/v1/stitcher/stats: Returns session count and total messages.
How It Works Under The Hood
Stitcher uses a backward-reading file algorithm. Every time you send a request or the proxy receives a response, it appends the message to an active.jsonl file in the session's directory.
When the proxy builds the context window:
- It reads the
active.jsonland any older rolled files (e.g.active.001.jsonl) backward, from newest to oldest. - It accumulates tokens until it hits your configured limit (e.g. 128k).
- It deduplicates text: it identifies near-identical assistant outputs and condenses older duplicates to save tokens.
- It reverses the collection to restore chronological order and swaps it into your request's
messagesarray.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file stitcher_proxy-0.1.0.tar.gz.
File metadata
- Download URL: stitcher_proxy-0.1.0.tar.gz
- Upload date:
- Size: 12.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eb83e9b2f78e3e25023f39318fdd5a251c11088149ca0e2cc05e21025486b2f3
|
|
| MD5 |
7cbe9c043a3efca6631b8ff0dfdcc93d
|
|
| BLAKE2b-256 |
3c44b97160b0607dd7c53c810e7a2f4f659636d801f97b21ff64121741ced2ff
|
File details
Details for the file stitcher_proxy-0.1.0-py3-none-any.whl.
File metadata
- Download URL: stitcher_proxy-0.1.0-py3-none-any.whl
- Upload date:
- Size: 14.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
df746e5b0993fd191b06d01388267b5023d36f5ecc976c8dbc610c321687c012
|
|
| MD5 |
6976e0f0d1d777f0394b0bb2882d5509
|
|
| BLAKE2b-256 |
19f70f5d92af00c67fb0c7ef79e9503d34396752cec8a5f8c778b84edbad1cf6
|