Skip to main content

Run Claude Code (and any Anthropic SDK client) on NVIDIA NIM models via a local proxy.

Project description

nvd-claude-proxy

PyPI Python License: MIT Code Style: Ruff

Run Claude Code — and any Anthropic SDK client — on enterprise-grade NVIDIA NIM models.

nvd-claude-proxy is a production-hardened local HTTP proxy that translates between the Anthropic Messages API and the NVIDIA NIM (OpenAI-compatible) API. It enables you to run Claude Code, the Anthropic SDK, and other clients using high-performance NVIDIA-hosted models with official-grade resilience and scaling.


🚀 Key Features

  • Architectural Excellence: Fully decoupled core translation logic from the transport layer.
  • Enterprise Resilience: Built-in Circuit Breakers and automated failover chains to protect against upstream outages.
  • Idempotency Support: Request deduplication and safe retries via anthropic-idempotency-key across Redis, SQLite, and Memory backends.
  • Scalable State: Distributed session management via Redis (with SQLite and In-Memory fallbacks).
  • Official-Grade Security: Unified AuthMiddleware protecting all endpoints with global API key enforcement.
  • Claude Code Optimized: Specifically tuned for Claude Code's complex tool-calling and reasoning patterns.
  • Vision & Progressive Streaming: Fine-grained progressive tool streaming and real-time multimodal (image_url) parity.
  • Modular Pipeline: Event-driven streaming architecture for deterministic state management.

🛠 Deployment & Configuration

Environment Variables

Variable Default Description
NVIDIA_API_KEY (Required) Your NVIDIA NIM API key.
PROXY_API_KEY None Optional key to protect the proxy itself.
STORAGE_ENGINE sqlite Persistence backend: redis, sqlite, or memory.
REDIS_URL None Required if STORAGE_ENGINE=redis (e.g., redis://localhost:6379).
PROXY_PORT 8788 Local port for the proxy.
RATE_LIMIT_RPM 0 Global rate limit (requests per minute). 0 to disable.

Quick Start

# Install the proxy
pip install nvd-claude-proxy[full]

# Export your API key
export NVIDIA_API_KEY=nvapi-...

# Run the proxy
ncp run

Then point your Claude Code at the proxy:

export ANTHROPIC_BASE_URL=http://localhost:8788
claude

🏗 Architecture

The proxy uses a Chain of Responsibility pattern for streaming events: MetadataProcessor -> TextProcessor -> ToolProcessor -> SafetyProcessor -> FinalizerProcessor

This ensures that even complex interleaved reasoning and parallel tool calls are correctly reconstructed for the Anthropic SDK.


Official-Grade Infrastructure for the AI Era.


Production Claude Code + NVIDIA NIM configuration

Use this proxy as the Anthropic-compatible endpoint for Claude Code:

export NVIDIA_API_KEY=nvapi-...
export PROXY_PORT=8788
export MAX_REQUEST_BODY_MB=32
export REQUEST_TIMEOUT_SECONDS=600
export STORAGE_ENGINE=redis
export REDIS_URL=redis://127.0.0.1:6379

# Optional but strongly recommended for shared/devbox usage
export PROXY_API_KEY=replace-with-a-long-random-secret

Run the proxy:

uv run ncp run
# or: ncp run

Point Claude Code at the proxy:

export ANTHROPIC_BASE_URL=http://127.0.0.1:8788
export ANTHROPIC_AUTH_TOKEN=dummy
claude

Recommended production notes

  • Prefer STORAGE_ENGINE=redis for stable rate limiting, idempotency, and multi-session behavior.
  • Keep MAX_REQUEST_BODY_MB=32 to avoid pathological payloads while still supporting large Claude Code tool catalogs.
  • Use the default streaming path; it emits early message_start and periodic ping events to reduce apparent latency and prevent idle timeouts.
  • If tool calls appear slow or malformed upstream, start with claude-sonnet-4-6 or claude-haiku-4-5 mappings before moving to larger reasoning models.
  • This proxy is translation-only: Claude Code executes tools locally; the proxy must preserve tool ordering, streamed JSON fragments, and Anthropic-compatible SSE grammar.

R2 low-latency mode

Version 1.3.5 adds a lightweight hosted-catalog runtime inspired by the one-file reference proxy. Use it when you care more about fast first-token latency and minimal overhead than about the full production registry/session stack.

Start R2 mode

ncp r2 --model nvidia/llama-3.3-nemotron-super-49b-v1.5
# or
nvd-claude-proxy-r2

Then point Claude Code at it:

M=nvidia/llama-3.3-nemotron-super-49b-v1.5
export ANTHROPIC_BASE_URL=http://127.0.0.1:8787
export ANTHROPIC_API_KEY=not-used
export ANTHROPIC_CUSTOM_MODEL_OPTION=$M
export ANTHROPIC_DEFAULT_HAIKU_MODEL=$M
export ANTHROPIC_DEFAULT_OPUS_MODEL=$M
export ANTHROPIC_DEFAULT_SONNET_MODEL=$M
export CLAUDE_CODE_SUBAGENT_MODEL=$M
claude

Why use R2 mode

  • eager message_start for lower perceived TTFT
  • 15s ping heartbeat during silent reasoning phases
  • simpler tool translation path
  • direct NVIDIA model IDs, no alias registry required
  • less overhead than the full production runtime

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nvd_claude_proxy-1.3.5.tar.gz (127.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nvd_claude_proxy-1.3.5-py3-none-any.whl (153.8 kB view details)

Uploaded Python 3

File details

Details for the file nvd_claude_proxy-1.3.5.tar.gz.

File metadata

  • Download URL: nvd_claude_proxy-1.3.5.tar.gz
  • Upload date:
  • Size: 127.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for nvd_claude_proxy-1.3.5.tar.gz
Algorithm Hash digest
SHA256 ce5102b17dfdcdadaa3de63aa7246d9396c8a78499b2ba1cd51506b0c52346c5
MD5 29dabddbbb6c2270f2e4213496c6e8d7
BLAKE2b-256 66310d2e41e7099489b2b16d38f498f4900375530240d4d28966f552efc94d12

See more details on using hashes here.

File details

Details for the file nvd_claude_proxy-1.3.5-py3-none-any.whl.

File metadata

File hashes

Hashes for nvd_claude_proxy-1.3.5-py3-none-any.whl
Algorithm Hash digest
SHA256 b7d1f87ac482007a2c555ec4128485d5a6481e8a7d176d2911e0886c3d6bf77f
MD5 5c433cc994861e6958257b7315b33c1d
BLAKE2b-256 55eeff8c2a62dfbeeda3ab919a373ef617c6e6e84b656b7a10504df06b3d7698

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page