Run Claude Code (and any Anthropic SDK client) on NVIDIA NIM models via a local proxy.
Project description
nvd-claude-proxy
Run Claude Code — and any Anthropic SDK client — on free NVIDIA NIM models.
A local HTTP proxy that speaks the Anthropic Messages API
and forwards requests to https://integrate.api.nvidia.com/v1 (NVIDIA NIM /
build.nvidia.com). Point your ANTHROPIC_BASE_URL at the proxy and your tools
work unchanged while the inference runs on Nemotron Ultra, Qwen3, DeepSeek-R1,
or any other NIM model.
Install
pip install nvd-claude-proxy
# Optional extras
pip install "nvd-claude-proxy[metrics]" # Prometheus /metrics endpoint
pip install "nvd-claude-proxy[pdf]" # PDF document block extraction
pip install "nvd-claude-proxy[full]" # everything above
Quick start
# 1. Get a free API key at https://build.nvidia.com (no credit card required)
export NVIDIA_API_KEY=nvapi-...
# 2. Start the proxy (default port 8788)
nvd-claude-proxy
In another shell (or in your shell profile):
export ANTHROPIC_BASE_URL=http://localhost:8788
export ANTHROPIC_API_KEY=not-used # any non-empty string works
export ANTHROPIC_MODEL=claude-opus-4-7 # → Nemotron Ultra 253B
export ANTHROPIC_SMALL_FAST_MODEL=claude-haiku-4-5 # → Nemotron Nano 9B v2
claude # launch Claude Code
No config file required — the default models.yaml is bundled in the package.
Default model mapping
| Claude alias | NVIDIA NIM model | Notes |
|---|---|---|
claude-opus-4-7 |
nvidia/llama-3.1-nemotron-ultra-253b-v1 |
Reasoning, best quality |
claude-sonnet-4-6 |
nvidia/llama-3.3-nemotron-super-49b-v1 |
Balanced |
claude-haiku-4-5 |
nvidia/nvidia-nemotron-nano-9b-v2 |
Fast, small |
claude-opus-4-7-vision |
meta/llama-4-maverick-17b-128e-instruct |
Vision-capable |
claude-qwen3 |
qwen/qwen3-235b-a22b |
Qwen3 thinking |
claude-r1 |
deepseek-ai/deepseek-r1 |
DeepSeek-R1 |
Legacy Claude 3.x names (claude-3-5-sonnet-*, claude-3-opus-*, etc.) are
automatically routed to the matching tier via prefix fallbacks.
Override by setting MODEL_CONFIG_PATH=/path/to/your/models.yaml.
Environment variables
| Variable | Default | Description |
|---|---|---|
NVIDIA_API_KEY |
required | nvapi-… key from build.nvidia.com |
NVIDIA_BASE_URL |
https://integrate.api.nvidia.com/v1 |
Override for self-hosted NIM |
PROXY_HOST |
127.0.0.1 |
Bind address (0.0.0.0 for Docker/remote) |
PROXY_PORT |
8788 |
Bind port |
PROXY_API_KEY |
(unset) | Require clients to send this key as Bearer token |
LOG_LEVEL |
INFO |
DEBUG / INFO / WARNING / ERROR |
MODEL_CONFIG_PATH |
(bundled) | Path to a custom models.yaml |
REQUEST_TIMEOUT_SECONDS |
600 |
Total request timeout (long for reasoning streams) |
MAX_RETRIES |
2 |
Upstream retry budget for transient 5xx |
RATE_LIMIT_RPM |
0 (off) |
Per-client requests/minute limit; 0 = disabled |
MAX_REQUEST_BODY_MB |
0 (off) |
Reject bodies larger than this; 0 = unlimited |
Variables can also be placed in a .env file in the working directory.
API endpoints
| Method | Path | Purpose |
|---|---|---|
POST |
/v1/messages |
Anthropic Messages — streaming & non-streaming |
POST |
/v1/messages/count_tokens |
Approximate token count (cl100k_base) |
GET |
/v1/models |
List model aliases |
GET |
/healthz |
Liveness probe |
GET |
/metrics |
Prometheus metrics (requires [metrics] extra) |
POST |
/v1/messages/batches |
501 stub (not supported by NIM) |
POST |
/v1/files |
501 stub |
Features
- Streaming — full SSE translation with keepalive
pingevents - Tool use — Anthropic tool definitions and results translated to OpenAI function-calling
- Reasoning / thinking —
thinking.budget_tokensenforced;<think>tags stripped - Vision — JPEG/PNG pass-through; GIF/WEBP transcoded to PNG
- PDF documents — base64 PDF blocks extracted to plain text (requires
[pdf]extra) - Model failover — automatic retry on 5xx with next model in the chain
- SIGHUP reload —
kill -HUP <pid>reloadsmodels.yamlwithout restart - Rate limiting — per-client fixed-window limiter (keyed on
metadata.user_idor IP) - Prometheus metrics — request count, token usage, latency histograms
- Cost estimation —
cost_usd_estfield in every structured log line
Custom model config
Create a models.yaml (start from the bundled default) and set MODEL_CONFIG_PATH:
defaults:
big: "my-model"
small: "my-model"
aliases:
my-model:
nvidia_id: "org/my-nim-model"
supports_tools: true
supports_vision: false
supports_reasoning: false
max_context: 131072
max_output: 16384
failover_to: []
prefix_fallbacks:
"claude-": "my-model"
MODEL_CONFIG_PATH=./my_models.yaml nvd-claude-proxy
Docker
docker run --rm -p 8788:8788 \
-e NVIDIA_API_KEY=nvapi-... \
ghcr.io/khiwn/nvd-claude-proxy:latest
Or clone the repo and use docker compose up.
Known limitations
- Prompt caching is silently ignored (NVIDIA NIM has no equivalent).
thinking.signatureis proxy-local — do not forward proxy-generated thinking blocks to the real Anthropic API.- DeepSeek-R1 + tool use is unreliable; use Nemotron models for agentic workloads.
- Free-tier NIM rate limit is ~40 RPM; heavy tool loops may hit 429s.
- Anthropic server tools (
web_search,bash,computer,code_execution,memory) are silently dropped. - Batch and Files APIs return 501 — NIM has no equivalent.
Development
git clone https://github.com/khiwn/nvd-claude-proxy
cd nvd-claude-proxy
make dev # pip install -e ".[dev,full]"
make test # pytest
make lint # ruff + mypy
make run # uvicorn on :8788
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nvd_claude_proxy-0.2.9.tar.gz.
File metadata
- Download URL: nvd_claude_proxy-0.2.9.tar.gz
- Upload date:
- Size: 54.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
75c7c971016d2c4ed6e48a7a75108ee2e89286dfc735b245db836cad133da3a1
|
|
| MD5 |
8c6a3528085a89890be6519145554473
|
|
| BLAKE2b-256 |
4aef9c3563594e8a2e814e14bbe93fed83497ce914e66205fdd05310f8250c3d
|
File details
Details for the file nvd_claude_proxy-0.2.9-py3-none-any.whl.
File metadata
- Download URL: nvd_claude_proxy-0.2.9-py3-none-any.whl
- Upload date:
- Size: 59.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
48a8ecd84178fefc816535cb695d966a86170c9c001eb7861add8431a36d0aa6
|
|
| MD5 |
260480890bee9aa263a12feaed19b3b9
|
|
| BLAKE2b-256 |
94da6a48c57c9acb1ece6c01e56a622f50138ec42b859a7f6547e8c824d6a94b
|