Skip to main content

Local OpenAI-compatible proxy for AI coding tools and DeepSeek thinking models

Project description

DeepSeek Bridge

PyPI version Python versions CI License

A local proxy that connects AI coding tools (Cursor, GitHub Copilot, Codex, and any OpenAI-compatible client) to DeepSeek's reasoning models by repairing the reasoning_content chain that these tools commonly drop from tool-call requests.

pip install deepseek-bridge

DeepSeek's thinking-mode API requires every assistant message in a multi-turn tool-call conversation to carry its complete reasoning_content back to the server. When a client omits this field, the API returns a 400 error. DeepSeek Bridge intercepts requests, restores the missing reasoning from a local cache, and forwards them upstream — no client-side changes needed.

Features

Reasoning Repair

  • Injects reasoning_content into outgoing tool-call requests, restoring previously cached reasoning from regular and streamed DeepSeek responses.
  • Displays thinking tokens in the client UI using collapsible Markdown <details> blocks.
  • Cursor Agent Mode support: automatically converts Responses API payloads to Chat Completions format.

Connection Resilience

  • Connection pooling via urllib3 with keep-alive and minimal retries.
  • Bounded thread pool prevents thread exhaustion on long-running streaming connections.
  • Configurable SSE read timeout (default 180 seconds) prevents hung threads on silent upstreams.
  • Tunnel support (cloudflared by default, ngrok optional) with health check and automatic reconnection.
  • Graceful shutdown on SIGTERM — active requests drain, reasoning cache is flushed.

API Compatibility

  • system_fingerprint in every streaming and non-streaming response.
  • x-request-id UUID header on every response.
  • OpenAI-standard error format.
  • CORS headers enabled by default.
  • /v1/embeddings, /v1/health, and /v1/models endpoints.
  • /v1/completions legacy endpoint (auto-converts prompt to messages).
  • Multimodal content arrays preserved.
  • DeepSeek V4 thinking parameter support (thinking, reasoning_effort, response_format, logprobs).
  • Silent mapping of legacy model names (deepseek-chat, deepseek-reasoner) to deepseek-v4-flash.

Logging and Observability

  • Persistent log files with --log-dir.
  • Heartbeat and pool utilization counters.
  • Full structured request traces with --trace-dir.
  • Terminal UI dashboard with real-time metrics, config editing, and log viewing.

TUI Dashboard

Starting with v0.2.0, DeepSeek Bridge opens a Terminal UI dashboard by default. The dashboard provides live monitoring and configuration:

  • Dashboard tab — Real-time request metrics, uptime, tunnel status, and pool utilization.
  • Config tab — Edit proxy settings (model, network, storage) without restarting.
  • Logs tab — Streaming log viewer with auto-scroll.

Use --headless to disable the TUI and run in classic CLI mode.

Why This Exists

DeepSeek's thinking-mode API enforces a strict contract: every assistant message that participates in a tool-call chain must include the full reasoning_content field. Some AI coding tools (including Cursor) drop this field from their chat transcript, causing DeepSeek to reject subsequent tool-call requests.

DeepSeek Bridge stores copies of reasoning_content from every response and patches missing entries back into requests before forwarding them upstream.

Installation

# From PyPI
pip install deepseek-bridge

# From source
git clone https://github.com/breixopd/deepseek-bridge.git
cd deepseek-bridge
uv run deepseek-bridge

Usage

# Full TUI dashboard (default)
deepseek-bridge

# Headless mode — no TUI, classic CLI output
deepseek-bridge --headless

# Run without tunnel (localhost only)
deepseek-bridge --tunnel none --port 9000

# Debug output with trace dumps
deepseek-bridge --debug --trace-dir ./dumps

# Use a custom config file
deepseek-bridge --config ./my-config.yaml

# Clear reasoning cache and exit
deepseek-bridge --clear-reasoning-cache

# Disable thinking display in client UI
deepseek-bridge --no-display-reasoning

On first run, DeepSeek Bridge creates:

  • ~/.deepseek-bridge/config.yaml — configuration file
  • ~/.deepseek-bridge/reasoning_content.sqlite3 — reasoning cache

Configuration

All settings are configurable via ~/.deepseek-bridge/config.yaml or command-line overrides. Example configuration:

model: deepseek-v4-pro
base_url: https://api.deepseek.com
thinking: enabled
reasoning_effort: max
display_reasoning: true
collapsible_reasoning: true

host: 127.0.0.1
port: 9000
tunnel: cloudflared
# ngrok_url: https://my-tunnel.ngrok.app  # optional: fixed ngrok endpoint
debug: false
cors: true
ollama: true
stream_read_timeout: 180
request_timeout: 300

Client Setup

Cursor

In Cursor, add a custom model with these settings:

  • Model: deepseek-v4-pro (or deepseek-v4-flash)
  • API Key: Your DeepSeek API key
  • Base URL: Your tunnel HTTPS URL with /v1 path (e.g., https://app.example.com/v1)

Note on tunnels: Cursor blocks non-public URLs such as localhost. DeepSeek Bridge uses Cloudflare Tunnel by default — a free, persistent HTTPS tunnel with no bandwidth or time limits. Use --tunnel none to disable tunneling. Use --tunnel ngrok if you prefer ngrok.

Cloudflare Tunnel Setup

Cloudflare Named Tunnels are free, persistent, support SSE streaming, and have no bandwidth/time limits. One-time setup:

# Install cloudflared
brew install cloudflare/cloudflare/cloudflared   # macOS
# Or download from: https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/downloads/

# Login and create a tunnel
cloudflared tunnel login
cloudflared tunnel create deepseek-bridge

# Point it at your domain
cloudflared tunnel route dns deepseek-bridge app.example.com

Then add your tunnel URL to ~/.deepseek-bridge/config.yaml:

tunnel: cloudflared
cf_url: https://app.example.com

Use --tunnel cloudflared on the CLI, or select cloudflared in the TUI dashboard.

GitHub Copilot

Configure the Ollama endpoint in VS Code:

{
  "github.copilot.chat.byok.ollamaEndpoint": "http://localhost:9000"
}

Then open Copilot Chat, navigate to "Manage Models", and your DeepSeek models appear automatically.

Agent Mode is supported — the proxy advertises tool_calls capability via the Ollama /api/show endpoint and handles reasoning repair across tool-call chains.

For the new customOAIModels path (VS Code Insiders 1.104+):

{
  "github.copilot.chat.customOAIModels": {
    "deepseek-v4-pro": {
      "name": "DeepSeek V4 Pro",
      "url": "http://localhost:9000/v1/chat/completions",
      "toolCalling": true,
      "vision": false,
      "thinking": true,
      "maxInputTokens": 1000000,
      "maxOutputTokens": 384000,
      "streaming": true,
      "requiresAPIKey": true
    }
  }
}

Other OpenAI-Compatible Clients

Any client that speaks the OpenAI /v1/chat/completions API can use DeepSeek Bridge. Set the client's base URL to http://localhost:9000/v1 (or your tunnel URL).

How It Works

  1. Request interception: The proxy receives a /v1/chat/completions request from the client.
  2. Format detection: If the request uses OpenAI Responses API format (common in Cursor Agent Mode), it is converted to Chat Completions format.
  3. Reasoning repair: Each assistant message in the conversation is checked. Missing reasoning_content fields are looked up in the local SQLite cache and restored.
  4. Cache isolation: Cache keys are scoped by a SHA-256 hash of the conversation prefix, upstream model, configuration, and API key. Different conversations and users never collide.
  5. Response processing: Reasoning content from the upstream response is cached for future requests. Display adapters mirror reasoning thoughts into Markdown <details> blocks visible in the client.

Known Limitations

Cursor Sub-Agents

Cursor sub-agents do not inherit custom API base URL or API key settings. This is a Cursor-side bug (see forum thread). Use the main agent (Cmd+Shift+0 to toggle) for direct DeepSeek chat. Sub-agents that route through the proxy will work correctly; those that bypass it fall back to Cursor's built-in models.

Cursor Agent Mode Responses API Format

Cursor Agent mode sends OpenAI Responses API-format payloads to the Chat Completions endpoint. DeepSeek Bridge detects and converts these automatically.

Reasoning Display

Cursor's native reasoning UI is available only for Cursor's own models. For custom endpoints, reasoning content is mirrored into visible Markdown details blocks. Use --no-display-reasoning to disable this behavior.

Development

# Run tests
uv run python -m unittest discover -s tests

# Format and lint
uv run pre-commit run --all-files

# Type check
uv run mypy src/ --check-untyped-defs

# Run with coverage
uv run coverage run -m unittest discover -s tests
uv run coverage report

CLI Reference

Flag Default Description
--headless off Run without TUI
--model deepseek-v4-pro Fallback model when request omits it
--thinking enabled DeepSeek thinking mode
--reasoning-effort max Reasoning effort level
--display-reasoning on Show reasoning content in client UI
--collapsible-reasoning on Use collapsible Markdown for reasoning
--host 127.0.0.1 Bind address
--port 9000 Bind port
--tunnel cloudflared Tunnel service (none, cloudflared, ngrok)
--cf-url none Cloudflare tunnel public URL
--ngrok-url none Fixed ngrok endpoint URL
--base-url https://api.deepseek.com Upstream DeepSeek API URL
--cors on Send CORS headers
--stream-read-timeout 180 SSE read timeout in seconds
--max-thread-pool 20 Max concurrent request threads
--max-pool-connections 10 Max upstream connections
--ollama / --no-ollama on Enable/disable Ollama endpoints
--log-dir none Directory for persistent log files
--trace-dir none Directory for request trace dumps
--debug off Enable DEBUG-level log output
--compact off One-line-per-request output
--config ~/.deepseek-bridge/config.yaml Config file path
--no-log off Disable all log file output
--reasoning-content-path ~/.deepseek-bridge/reasoning_content.sqlite3 Reasoning cache path
--reasoning-cache-max-age-seconds 604800 Max age of cached reasoning (seconds)
--missing-reasoning-strategy recover Strategy for missing reasoning (recover/reject)
--max-request-body-bytes 20971520 Max request body size in bytes
--clear-reasoning-cache off Clear reasoning cache and exit
--version - Print version and exit

License

MIT License

Acknowledgements

Based on yxlao/deepseek-cursor-proxy, the original project that started this work.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deepseek_bridge-0.5.4.tar.gz (342.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deepseek_bridge-0.5.4-py3-none-any.whl (75.8 kB view details)

Uploaded Python 3

File details

Details for the file deepseek_bridge-0.5.4.tar.gz.

File metadata

  • Download URL: deepseek_bridge-0.5.4.tar.gz
  • Upload date:
  • Size: 342.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for deepseek_bridge-0.5.4.tar.gz
Algorithm Hash digest
SHA256 400e79933f62cdf3ec76ec4291de59b0294ad46f47e116c16e462f13843c74a8
MD5 557efbcad1a0395c7b051e27bb04c8a5
BLAKE2b-256 8c3965851d82c5596735784c299b263d3d1750563cdaccb08c5347401f682d60

See more details on using hashes here.

File details

Details for the file deepseek_bridge-0.5.4-py3-none-any.whl.

File metadata

  • Download URL: deepseek_bridge-0.5.4-py3-none-any.whl
  • Upload date:
  • Size: 75.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for deepseek_bridge-0.5.4-py3-none-any.whl
Algorithm Hash digest
SHA256 07dcb4124b7f040d3e9297602de185b1a0100d35c45f9f038a4612c6c55b8e28
MD5 e61cc148f8386a3a2ebb62739ad7f811
BLAKE2b-256 82badaf01f4c45492956aa195b350244c734fa243d6793d8d99b861897df399b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page