Skip to main content

Universal, infinite-memory proxy for LLM APIs.

Project description

Stitcher Proxy 🦞

LLMs have amnesia. Stitcher is the cure.

Stitcher Proxy is a universal, infinite-memory proxy for any OpenAI-compatible LLM API.

The Problem

LLMs forget what you told them 10 minutes ago. Their context windows are limited, and clients drop older messages when they hit token limits. Real conversations span days, weeks, or months, but APIs treat every request as an isolated event.

The Solution

Stitcher is a transparent proxy that gives your LLM infinite memory. You send your new messages to Stitcher; Stitcher instantly pieces together the entire history from local JSONL storage, intelligently deduplicates it, fits it into a precise token budget, and forwards it to the upstream API. It "stitches" the context back together.

Quick Start

pip install stitcher-proxy
stitcher-proxy init
stitcher-proxy start

init runs an interactive setup wizard to configure your provider, API key, model, and token budget.

Works With

Stitcher acts as a transparent, infinite-memory drop-in for:

  • Claude Code
  • OpenClaw
  • Codex
  • Cursor
  • LangChain
  • Vercel AI SDK
  • Any OpenAI client

CLI Subcommands

Stitcher Proxy includes a full CLI suite for managing your proxy and sessions.

  • stitcher-proxy init — Run the interactive setup wizard.
  • stitcher-proxy start — Start the proxy.
  • stitcher-proxy status — Show running status, session count, and config summary.
  • stitcher-proxy sessions — List all sessions with message counts and storage sizes.
  • stitcher-proxy sessions purge <name> — Delete a specific session's data.
  • stitcher-proxy config — Print current configuration.
  • stitcher-proxy config set <key> <value> — Update a config value.
  • stitcher-proxy integrate <target> — Show integration guides (e.g., claude-code, openclaw, codex).

Integration Guides

Stitcher provides built-in integration guides for popular tools. Run stitcher-proxy integrate to see all options.

Global Environment Variable Support

The proxy works globally when set via standard base URL environment variables. Clients will seamlessly route their requests through Stitcher:

export OPENAI_BASE_URL=http://localhost:8081/v1
export ANTHROPIC_BASE_URL=http://localhost:8081/v1

How It Works

[ Client ] ---POST /v1/chat/completions---> [ Stitcher Proxy ]
(Only sends                                       │
 new msg)                                         ▼
                                          Reads local JSONL
                                          Stitches history backwards
                                          Deduplicates repetitive text
                                          Enforces token budget (e.g. 128k)
                                                  │
                                                  ▼
[ Upstream API ] <------Full Context------- [ Stitcher Proxy ]
(OpenAI/Anthropic)

Usage Examples

Python (OpenAI SDK)

from openai import OpenAI

# Just change the base_url. That's it.
client = OpenAI(
    base_url="http://localhost:8081/v1",
    api_key="your-real-api-key",  # Passed through to upstream
    default_headers={"X-Stitcher-Session": "my-app-user-123"}
)

# Use normally. Stitcher handles infinite memory transparently.
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What did we discuss yesterday?"}]
)
# ^ Even though you only sent 1 message, Stitcher injected
# the full conversation history behind the scenes.

cURL

curl http://localhost:8081/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "X-Stitcher-Session: terminal-session-99" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello again!"}]
  }'

Configuration

Config loading priority: CLI flags > Environment Variables > ~/.stitcher/config.json > Defaults.

CLI Flag Description Default
--port Port to run the proxy on 8081
--upstream Upstream LLM API URL https://api.openai.com
--max-tokens Token budget for stitched context 128000
--data-dir Directory for JSONL storage ~/.stitcher/sessions

API Endpoints

  • POST /v1/chat/completions: The main OpenAI-compatible proxy endpoint. Supports both normal requests and SSE streaming (stream: true). Pass X-Stitcher-Session header to isolate memory, otherwise the session is derived from the first message.
  • GET /v1/stitcher/stats: Returns session count and total messages.

How It Works Under The Hood

Stitcher uses a backward-reading file algorithm. Every time you send a request or the proxy receives a response, it appends the message to an active.jsonl file in the session's directory. When the proxy builds the context window:

  1. It reads the active.jsonl and any older rolled files (e.g. active.001.jsonl) backward, from newest to oldest.
  2. It accumulates tokens until it hits your configured limit (e.g. 128k).
  3. It deduplicates text: it identifies near-identical assistant outputs and condenses older duplicates to save tokens.
  4. It reverses the collection to restore chronological order and swaps it into your request's messages array.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stitcher_proxy-0.1.0.tar.gz (12.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

stitcher_proxy-0.1.0-py3-none-any.whl (14.7 kB view details)

Uploaded Python 3

File details

Details for the file stitcher_proxy-0.1.0.tar.gz.

File metadata

  • Download URL: stitcher_proxy-0.1.0.tar.gz
  • Upload date:
  • Size: 12.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for stitcher_proxy-0.1.0.tar.gz
Algorithm Hash digest
SHA256 eb83e9b2f78e3e25023f39318fdd5a251c11088149ca0e2cc05e21025486b2f3
MD5 7cbe9c043a3efca6631b8ff0dfdcc93d
BLAKE2b-256 3c44b97160b0607dd7c53c810e7a2f4f659636d801f97b21ff64121741ced2ff

See more details on using hashes here.

File details

Details for the file stitcher_proxy-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: stitcher_proxy-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 14.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for stitcher_proxy-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 df746e5b0993fd191b06d01388267b5023d36f5ecc976c8dbc610c321687c012
MD5 6976e0f0d1d777f0394b0bb2882d5509
BLAKE2b-256 19f70f5d92af00c67fb0c7ef79e9503d34396752cec8a5f8c778b84edbad1cf6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page