Skip to main content

Run Claude Code (and any Anthropic SDK client) on NVIDIA NIM models via a local proxy.

Project description

nvd-claude-proxy

PyPI Python License: MIT

Run Claude Code — and any Anthropic SDK client — on free NVIDIA NIM models.

A local HTTP proxy that speaks the Anthropic Messages API and forwards requests to https://integrate.api.nvidia.com/v1 (NVIDIA NIM / build.nvidia.com). Point your ANTHROPIC_BASE_URL at the proxy and your tools work unchanged while the inference runs on Nemotron Ultra, Qwen3, DeepSeek-R1, or any other NIM model.


Install

pip install nvd-claude-proxy

# Optional extras
pip install "nvd-claude-proxy[metrics]"   # Prometheus /metrics endpoint
pip install "nvd-claude-proxy[pdf]"       # PDF document block extraction
pip install "nvd-claude-proxy[full]"      # everything above

Quick start

# 1. Get a free API key at https://build.nvidia.com  (no credit card required)
export NVIDIA_API_KEY=nvapi-...

# 2. Start the proxy (default port 8788)
nvd-claude-proxy

In another shell (or in your shell profile):

export ANTHROPIC_BASE_URL=http://localhost:8788
export ANTHROPIC_API_KEY=not-used          # any non-empty string works
export ANTHROPIC_MODEL=claude-opus-4-7     # → Nemotron Ultra 253B
export ANTHROPIC_SMALL_FAST_MODEL=claude-haiku-4-5  # → Nemotron Nano 9B v2
claude                                     # launch Claude Code

No config file required — the default models.yaml is bundled in the package.


Default model mapping

Claude alias NVIDIA NIM model Notes
claude-opus-4-7 nvidia/llama-3.1-nemotron-ultra-253b-v1 Reasoning, best quality
claude-sonnet-4-6 nvidia/llama-3.3-nemotron-super-49b-v1 Balanced
claude-haiku-4-5 nvidia/nvidia-nemotron-nano-9b-v2 Fast, small
claude-opus-4-7-vision meta/llama-4-maverick-17b-128e-instruct Vision-capable
claude-qwen3 qwen/qwen3-235b-a22b Qwen3 thinking
claude-r1 deepseek-ai/deepseek-r1 DeepSeek-R1

Legacy Claude 3.x names (claude-3-5-sonnet-*, claude-3-opus-*, etc.) are automatically routed to the matching tier via prefix fallbacks.

Override by setting MODEL_CONFIG_PATH=/path/to/your/models.yaml.


Environment variables

Variable Default Description
NVIDIA_API_KEY required nvapi-… key from build.nvidia.com
NVIDIA_BASE_URL https://integrate.api.nvidia.com/v1 Override for self-hosted NIM
PROXY_HOST 127.0.0.1 Bind address (0.0.0.0 for Docker/remote)
PROXY_PORT 8788 Bind port
PROXY_API_KEY (unset) Require clients to send this key as Bearer token
LOG_LEVEL INFO DEBUG / INFO / WARNING / ERROR
MODEL_CONFIG_PATH (bundled) Path to a custom models.yaml
REQUEST_TIMEOUT_SECONDS 600 Total request timeout (long for reasoning streams)
MAX_RETRIES 2 Upstream retry budget for transient 5xx
RATE_LIMIT_RPM 0 (off) Per-client requests/minute limit; 0 = disabled
MAX_REQUEST_BODY_MB 0 (off) Reject bodies larger than this; 0 = unlimited

Variables can also be placed in a .env file in the working directory.


API endpoints

Method Path Purpose
POST /v1/messages Anthropic Messages — streaming & non-streaming
POST /v1/messages/count_tokens Approximate token count (cl100k_base)
GET /v1/models List model aliases
GET /healthz Liveness probe
GET /metrics Prometheus metrics (requires [metrics] extra)
POST /v1/messages/batches 501 stub (not supported by NIM)
POST /v1/files 501 stub

Features

  • Streaming — full SSE translation with keepalive ping events
  • Tool use — Anthropic tool definitions and results translated to OpenAI function-calling
  • Reasoning / thinkingthinking.budget_tokens enforced; <think> tags stripped
  • Vision — JPEG/PNG pass-through; GIF/WEBP transcoded to PNG
  • PDF documents — base64 PDF blocks extracted to plain text (requires [pdf] extra)
  • Model failover — automatic retry on 5xx with next model in the chain
  • SIGHUP reloadkill -HUP <pid> reloads models.yaml without restart
  • Rate limiting — per-client fixed-window limiter (keyed on metadata.user_id or IP)
  • Prometheus metrics — request count, token usage, latency histograms
  • Cost estimationcost_usd_est field in every structured log line

Custom model config

Create a models.yaml (start from the bundled default) and set MODEL_CONFIG_PATH:

defaults:
  big: "my-model"
  small: "my-model"

aliases:
  my-model:
    nvidia_id: "org/my-nim-model"
    supports_tools: true
    supports_vision: false
    supports_reasoning: false
    max_context: 131072
    max_output: 16384
    failover_to: []

prefix_fallbacks:
  "claude-": "my-model"
MODEL_CONFIG_PATH=./my_models.yaml nvd-claude-proxy

Docker

docker run --rm -p 8788:8788 \
  -e NVIDIA_API_KEY=nvapi-... \
  ghcr.io/khiwn/nvd-claude-proxy:latest

Or clone the repo and use docker compose up.


Known limitations

  • Prompt caching is silently ignored (NVIDIA NIM has no equivalent).
  • thinking.signature is proxy-local — do not forward proxy-generated thinking blocks to the real Anthropic API.
  • DeepSeek-R1 + tool use is unreliable; use Nemotron models for agentic workloads.
  • Free-tier NIM rate limit is ~40 RPM; heavy tool loops may hit 429s.
  • Anthropic server tools (web_search, bash, computer, code_execution, memory) are silently dropped.
  • Batch and Files APIs return 501 — NIM has no equivalent.

Development

git clone https://github.com/khiwn/nvd-claude-proxy
cd nvd-claude-proxy
make dev    # pip install -e ".[dev,full]"
make test   # pytest
make lint   # ruff + mypy
make run    # uvicorn on :8788

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nvd_claude_proxy-0.2.8.tar.gz (53.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nvd_claude_proxy-0.2.8-py3-none-any.whl (59.4 kB view details)

Uploaded Python 3

File details

Details for the file nvd_claude_proxy-0.2.8.tar.gz.

File metadata

  • Download URL: nvd_claude_proxy-0.2.8.tar.gz
  • Upload date:
  • Size: 53.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for nvd_claude_proxy-0.2.8.tar.gz
Algorithm Hash digest
SHA256 8d15f2fa82ec3946081087bf0dd83e2e500310e3a51f091b97e4d57e7ae5c1ff
MD5 7cdfa8e1b64abb87cadf37b6f2db6379
BLAKE2b-256 a60ef5918d9f8f18a737e28da68d7a6a46a6ae04fb19d57eb4755d64fa4c0b3d

See more details on using hashes here.

File details

Details for the file nvd_claude_proxy-0.2.8-py3-none-any.whl.

File metadata

File hashes

Hashes for nvd_claude_proxy-0.2.8-py3-none-any.whl
Algorithm Hash digest
SHA256 8c683c6bb4abd94d5172b729356dbf8a91bdd82ef7b3b6ddd49d1f999bdefec6
MD5 8e5d62253a0e07529308e931ab7fe142
BLAKE2b-256 96a21621022b52e63a1737e6447a14207efef9412b774804a5400cd519f3a8e8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page