Run Claude Code (and any Anthropic SDK client) on NVIDIA NIM models via a local proxy.

These details have not been verified by PyPI

Project links

Project description

nvd-claude-proxy

Run Claude Code — and any Anthropic SDK client — on free NVIDIA NIM models.

A local HTTP proxy that speaks the Anthropic Messages API and forwards requests to https://integrate.api.nvidia.com/v1 (NVIDIA NIM / build.nvidia.com). Point your ANTHROPIC_BASE_URL at the proxy and your tools work unchanged while inference runs on Nemotron Ultra, Qwen3, DeepSeek-R1, or any other NIM model.

Install

# Recommended: isolated global install
pipx install nvd-claude-proxy

# Or plain pip
pip install nvd-claude-proxy

# Optional extras
pip install "nvd-claude-proxy[metrics]"   # Prometheus /metrics endpoint
pip install "nvd-claude-proxy[pdf]"       # PDF document block extraction
pip install "nvd-claude-proxy[full]"      # everything above

Quick start — `ncp` CLI (recommended)

# First run: save your API key permanently
ncp init
# → prompts for NVIDIA_API_KEY and saves to ~/.config/nvd-claude-proxy/.env

# Launch proxy + Claude Code in one command
ncp code

That's it. ncp code starts the proxy in the background, waits until it is ready, then launches claude. When Claude exits the proxy stops cleanly.

All `ncp` commands

Command	Description
`ncp code`	Start proxy → launch Claude Code
`ncp proxy`	Start proxy only (foreground)
`ncp init`	Save `NVIDIA_API_KEY` to global config
`ncp models list`	Show all configured model aliases
`ncp kill`	Terminate any stuck proxy process on port 8788
`ncp version`	Print version

Pass --api-key nvapi-… to any command for a one-shot override without saving.

Quick start — manual

# 1. Get a free API key at https://build.nvidia.com  (no credit card required)
export NVIDIA_API_KEY=nvapi-...

# 2. Start the proxy (default port 8788)
nvd-claude-proxy

In another shell:

export ANTHROPIC_BASE_URL=http://localhost:8788
export ANTHROPIC_API_KEY=not-used          # any non-empty string works
export ANTHROPIC_MODEL=claude-opus-4-7     # → Nemotron Ultra 253B
export ANTHROPIC_SMALL_FAST_MODEL=claude-haiku-4-5  # → Nemotron Nano 9B v2
claude

Default model mapping

Claude alias	NVIDIA NIM model	Notes
`claude-opus-4-7`	`nvidia/llama-3.1-nemotron-ultra-253b-v1`	Reasoning, best quality
`claude-sonnet-4-6`	`nvidia/llama-3.3-nemotron-super-49b-v1`	Balanced
`claude-haiku-4-5`	`nvidia/nvidia-nemotron-nano-9b-v2`	Fast, small
`claude-opus-4-7-vision`	`meta/llama-4-maverick-17b-128e-instruct`	Vision-capable
`claude-qwen3`	`qwen/qwen3-235b-a22b`	Qwen3 thinking
`claude-r1`	`deepseek-ai/deepseek-r1`	DeepSeek-R1

Legacy Claude 3.x names (claude-3-5-sonnet-*, claude-3-opus-*, etc.) are automatically routed to the matching tier via prefix fallbacks.

Override by setting MODEL_CONFIG_PATH=/path/to/your/models.yaml.

Environment variables

Variable	Default	Description
`NVIDIA_API_KEY`	required	`nvapi-…` key from build.nvidia.com
`NVIDIA_BASE_URL`	`https://integrate.api.nvidia.com/v1`	Override for self-hosted NIM
`PROXY_HOST`	`127.0.0.1`	Bind address (`0.0.0.0` for Docker/remote)
`PROXY_PORT`	`8788`	Bind port
`PROXY_API_KEY`	(unset)	Require clients to present this key as Bearer token
`LOG_LEVEL`	`INFO`	`DEBUG` / `INFO` / `WARNING` / `ERROR`
`MODEL_CONFIG_PATH`	(bundled)	Path to a custom `models.yaml`
`REQUEST_TIMEOUT_SECONDS`	`600`	Total request timeout (long for reasoning streams)
`MAX_RETRIES`	`2`	Upstream retry budget for transient 5xx
`RATE_LIMIT_RPM`	`0` (off)	Per-client sliding-window requests/minute; 0 = off
`MAX_REQUEST_BODY_MB`	`0` (off)	Reject bodies larger than this; 0 = unlimited

Variables can be placed in:

.env in the current directory
~/.config/nvd-claude-proxy/.env (written by ncp init)

API endpoints

Method	Path	Purpose
`POST`	`/v1/messages`	Anthropic Messages — streaming & non-streaming
`POST`	`/v1/messages/count_tokens`	Approximate token count (cl100k_base)
`GET`	`/v1/models`	List model aliases
`GET`	`/v1/models/{id}`	Single model lookup
`GET`	`/healthz`	Liveness probe
`GET`	`/metrics`	Prometheus metrics (`[metrics]` extra)
`POST`	`/v1/messages/batches`	501 stub (not supported by NIM)
`POST`	`/v1/files`	501 stub

Features

Full Anthropic SDK compatibility — anthropic-version header, correct SSE Content-Type, proper message_start token counts
Streaming — strict Anthropic SSE event ordering with keepalive ping events every 15 s
Tool use — Anthropic tool definitions → OpenAI function-calling; parallel tool calls; schema sanitization for NIM
Reasoning / thinking — thinking.budget_tokens enforced; <think> tags stripped from non-streaming
Vision — JPEG/PNG pass-through; GIF/WEBP transcoded to PNG
PDF documents — base64 PDF blocks extracted to plain text (requires [pdf] extra)
Model failover — automatic retry on 5xx with the next model in the configured chain
Context overflow guard — pre-flight check returns a clean 400 before the request reaches NVIDIA when input exceeds the model's window
Shared connection pool — one httpx.AsyncClient for all requests (no per-request TLS setup)
SIGHUP reload — kill -HUP <pid> reloads models.yaml without restart
Sliding-window rate limiter — per-client, keyed on metadata.user_id or IP
Prometheus metrics — request count, token usage, latency histograms
Cost estimation — cost_usd_est field in every structured log line

Custom model config

Create a models.yaml (start from the bundled default):

defaults:
  big: "my-model"
  small: "my-model"

aliases:
  my-model:
    nvidia_id: "org/my-nim-model"
    supports_tools: true
    supports_vision: false
    supports_reasoning: false
    max_context: 131072
    max_output: 16384
    failover_to: []

prefix_fallbacks:
  "claude-": "my-model"

MODEL_CONFIG_PATH=./my_models.yaml nvd-claude-proxy
# or
ncp code --model-config ./my_models.yaml

Docker

docker run --rm -p 8788:8788 \
  -e NVIDIA_API_KEY=nvapi-... \
  ghcr.io/khiwn/nvd-claude-proxy:latest

Or clone the repo and run:

cp .env.example .env      # fill in NVIDIA_API_KEY
docker compose up

Troubleshooting

Symptom	Fix
`NVIDIA_API_KEY Field required`	Run `ncp init` to save your key globally
`proxy did not start in time`	Run `ncp kill` then `ncp code` again
`132183 input tokens > 131072`	Context overflow — proxy now returns a clean 400 with explanation
`429 rate_limit_error`	Hit NIM free-tier 40 RPM cap; wait 60 s or upgrade your NIM plan
Claude Code shows tool errors	Check `ncp models list` — server tools are silently dropped

Known limitations

Prompt caching is silently ignored (NVIDIA NIM has no equivalent).
thinking.signature is proxy-local — do not forward proxy-generated thinking blocks to the real Anthropic API.
DeepSeek-R1 + tool use is unreliable; use Nemotron models for agentic workloads.
Anthropic server tools (web_search, bash, computer, code_execution, memory) are silently dropped.
Batch and Files APIs return 501 — NIM has no equivalent.

Development

git clone https://github.com/khiwn/nvd-claude-proxy
cd nvd-claude-proxy
cp .env.example .env      # fill in NVIDIA_API_KEY
make dev                  # pip install -e ".[dev,full]"
make test                 # pytest
make lint                 # ruff + mypy
make run                  # uvicorn on :8788

License

MIT — see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.6

Apr 20, 2026

0.3.5

Apr 20, 2026

This version

0.3.4

Apr 20, 2026

0.3.3

Apr 20, 2026

0.3.2

Apr 20, 2026

0.3.1

Apr 20, 2026

0.3.0

Apr 20, 2026

0.2.9

Apr 20, 2026

0.2.8

Apr 20, 2026

0.2.7

Apr 20, 2026

0.2.6

Apr 20, 2026

0.2.5

Apr 20, 2026

0.2.4

Apr 20, 2026

0.2.3

Apr 20, 2026

0.2.2

Apr 20, 2026

0.2.0

Apr 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nvd_claude_proxy-0.3.4.tar.gz (61.2 kB view details)

Uploaded Apr 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nvd_claude_proxy-0.3.4-py3-none-any.whl (67.2 kB view details)

Uploaded Apr 20, 2026 Python 3

File details

Details for the file nvd_claude_proxy-0.3.4.tar.gz.

File metadata

Download URL: nvd_claude_proxy-0.3.4.tar.gz
Upload date: Apr 20, 2026
Size: 61.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for nvd_claude_proxy-0.3.4.tar.gz
Algorithm	Hash digest
SHA256	`8c52f94802c9dfa97c734f5ccb506f5bc3c0bd211e933141ca18e3f3716018b8`
MD5	`b1cf633d5e6161bea93ec1a0edf73e53`
BLAKE2b-256	`aca384bb0d41f3e81cf0f2c494c841ae214db126a6133dac57f48af3ad580e1d`

See more details on using hashes here.

File details

Details for the file nvd_claude_proxy-0.3.4-py3-none-any.whl.

File metadata

Download URL: nvd_claude_proxy-0.3.4-py3-none-any.whl
Upload date: Apr 20, 2026
Size: 67.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for nvd_claude_proxy-0.3.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d2089058b6312ed917269b17d619cb9446af93fd5121e5a166735a3ff3db4a63`
MD5	`a1f0129e33161826fea03d07009ed5e3`
BLAKE2b-256	`a6a018ba5ecaaba13f2bce4b120cab3117da3af1c800f0832f4b1ca78fd24974`

See more details on using hashes here.

nvd-claude-proxy 0.3.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

nvd-claude-proxy

Install

Quick start — ncp CLI (recommended)

All ncp commands

Quick start — manual

Default model mapping

Environment variables

API endpoints

Features

Custom model config

Docker

Troubleshooting

Known limitations

Development

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Quick start — `ncp` CLI (recommended)

All `ncp` commands