Anthropic Messages → NVIDIA NIM Proxy for Claude Code

These details have not been verified by PyPI

Project description

nvd-nim-proxy

Run Claude Code on NVIDIA's free hosted AI catalog — no Anthropic subscription needed.

Claude Code ──/v1/messages──► nvd-nim-proxy ──/v1/chat/completions──► integrate.api.nvidia.com
  (Anthropic SSE protocol)     (translation)      (OpenAI SSE protocol)        (NVIDIA NIM)

One command. Free API key. Full Claude Code experience backed by Nemotron.

Why this exists

integrate.api.nvidia.com speaks OpenAI Chat Completions. Claude Code speaks Anthropic Messages. This proxy sits between them and translates everything — streaming SSE events, tool calls, vision, reasoning blocks, error envelopes — so Claude Code never knows the difference.

Note: If you can run a NIM container yourself (single H100 or L40S), you don't need this proxy — see NVIDIA's official Claude Code integration guide. This is for the free hosted catalog at build.nvidia.com.

Quickstart — 2 minutes

# 1. Install
pip install nim-claude-proxy

# 2. Configure (guided wizard)
nim init
#  🔑 Enter NVIDIA_API_KEY (get one free at https://build.nvidia.com)
#  🔌 Proxy port [8787]

# 3. Start the proxy daemon
nim start
#  ● Proxy started  PID 12345  http://127.0.0.1:8787
#
#  ┌─ Claude Code env vars ──────────────────────────────┐
#  │  export ANTHROPIC_BASE_URL=http://127.0.0.1:8787    │
#  │  export ANTHROPIC_API_KEY=not-used                  │
#  └─────────────────────────────────────────────────────┘

# 4. Launch Claude Code (proxy keeps running between sessions)
nim code

Or skip the daemon and just use the one-liner:

NVIDIA_API_KEY=nvapi-... nim code

Deploy on Cloudflare

This repository includes a Cloudflare Workers + Containers configuration at the repository root (wrangler.toml) and a Worker entrypoint in worker/src/index.ts. The Worker runs the Python FastAPI proxy inside a Cloudflare Container and forwards /v1/messages, /v1/models, and /v1/messages/count_tokens to it.

One-click: use the Deploy to Cloudflare button above, then set the required secret in the created Worker project:

npx wrangler secret put NVIDIA_API_KEY
npx wrangler secret put PROXY_API_KEY   # strongly recommended for public URLs

Manual deploy:

npm install
npx wrangler secret put NVIDIA_API_KEY
npx wrangler secret put PROXY_API_KEY   # optional locally, recommended publicly
npm run deploy

Then point Claude Code at your Worker URL:

export ANTHROPIC_BASE_URL=https://your-worker.your-subdomain.workers.dev
export ANTHROPIC_API_KEY=$PROXY_API_KEY
export CLAUDE_CODE_ENABLE_GATEWAY_MODEL_DISCOVERY=1
export CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS=1
export ENABLE_TOOL_SEARCH=false
export CLAUDE_CODE_DISABLE_THINKING=1
export DISABLE_INTERLEAVED_THINKING=1
claude

Production notes:

PROXY_API_KEY protects your public Worker URL from becoming an open NVIDIA API relay.
Cloudflare builds and pushes the container image from Dockerfile during wrangler deploy; Docker must be available for manual local deploys.
The edge Worker returns /healthz and /health without waking the container and rejects unauthenticated /v1/* traffic before container startup when PROXY_API_KEY is configured.
Optional secrets/vars: DEFAULT_NVIDIA_MODEL, MAX_OUTPUT_TOKENS, CONTEXT_SAFETY_MARGIN, LOG_LEVEL.

⚡ Instant Model Switching

Switch to any model on NVIDIA's catalog in one command — no config file editing, no restart required:

# Switch your default model permanently
nim use qwen/qwen3-235b-a22b
nim use z-ai/glm-5.1
nim use meta/llama-4-maverick-17b-128e-instruct
nim use nvidia/llama-3.1-nemotron-ultra-253b-v1

# One-session override (default unchanged)
nim code --model qwen/qwen3-235b-a22b
nim test --model z-ai/glm-5.1

# Test any model immediately
nim test --model meta/llama-3.3-70b-instruct "Write a haiku about GPUs"

nim use saves the model to ~/.config/nim-proxy/config.yaml and restarts the proxy automatically if it's running. Any model ID from build.nvidia.com works — no aliases, no mapping needed.

How it works: The proxy passes any provider/model ID straight to NVIDIA unchanged. Only claude-* names get remapped to your configured NVIDIA model. Everything else is zero-friction passthrough.

CLI Reference

Command	Description
`nim init`	Interactive setup wizard — saves config to `~/.config/nim-proxy/`
`nim start`	Start proxy as background daemon
`nim stop`	Stop the daemon
`nim restart`	Restart daemon
`nim status`	Show PID, URL, model, API key, health
`nim logs [-f] [-n N]`	View proxy logs; `-f` tails live
`nim code [--model ID]`	Start daemon if needed, then launch Claude Code
`nim doctor`	Diagnose: Python, key, NVIDIA API, port, health, Claude install
`nim configure <key> <val>`	Set a config value (`server.port`, `nvidia.default_model`, …)
`nim configure --list`	Print effective config (secrets redacted)
`nim use <model>`	Switch model instantly — saves config + restarts daemon
`nim models`	List available NVIDIA NIM models
`nim test [prompt]`	Send a one-shot test request and show the result
`nim proxy`	Start proxy in foreground (debugging)
`nim version`	Print version

Recommended Models

Model	Best for
`nvidia/llama-3.3-nemotron-super-49b-v1.5`	Default. Best reasoning + tools balance
`nvidia/llama-3.1-nemotron-ultra-253b-v1`	Strongest reasoning — slower TTFT
`nvidia/nvidia-nemotron-nano-9b-v2`	Fast responses; good for sub-agent (`HAIKU_MODEL`)
`meta/llama-3.3-70b-instruct`	General purpose, no reasoning overhead
`qwen/qwen3-235b-a22b`	Strong coder, MoE architecture
`meta/llama-4-maverick-17b-128e-instruct`	Vision + tools

⚠️ Avoid deepseek-ai/deepseek-r1 — its tool-calling and reasoning paths are mutually exclusive on the hosted endpoint.

Configuration

Config is stored at ~/.config/nim-proxy/config.yaml and can be edited directly or via nim configure:

nim configure server.port 9000
nim configure nvidia.default_model nvidia/llama-3.1-nemotron-ultra-253b-v1
nim configure --list   # print all settings (key redacted)

Environment variables override YAML and are never written to disk:

export NVIDIA_API_KEY=nvapi-...           # required
export DEFAULT_NVIDIA_MODEL=nvidia/...    # override default model
export PROXY_HOST=127.0.0.1
export PROXY_PORT=8787
export PROXY_API_KEY=secret              # optional: require x-api-key from clients
export LOG_LEVEL=info

# Claude Code gateway compatibility knobs used by `nim code`
export CLAUDE_CODE_ENABLE_GATEWAY_MODEL_DISCOVERY=1
export CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS=1
export ENABLE_TOOL_SEARCH=false
export CLAUDE_CODE_DISABLE_THINKING=1
export DISABLE_INTERLEAVED_THINKING=1

Model aliases in config.example.yaml map Claude Code model names to NVIDIA models automatically — no need to set ANTHROPIC_DEFAULT_*_MODEL manually when using nim code.

What's translated

Feature	Status
Streaming `/v1/messages`	✅ Full SSE event sequence
Non-streaming `/v1/messages`	✅
Tool calling (single + parallel)	✅ `tool_use` ↔ `tool_calls`
`tool_result` round-trip	✅
System prompts (string + block array)	✅
Vision (base64 + URL)	✅
Reasoning (`reasoning_content` + `<think>` tags)	✅ response-side conversion; Anthropic-only thinking requests are disabled by `nim code`
Token counting (`/v1/messages/count_tokens`)	✅ heuristic ±15%
Model listing (`/v1/models`)	✅ proxied; `claude-*` aliases support Claude Code gateway discovery
Eager `message_start` (sub-100 ms TTFT)	✅
15 s ping heartbeat during reasoning	✅ keeps TUI alive
Context-window overflow retry	✅ clamps output and retries once on NVIDIA tokenizer errors
HTTP/2 to NVIDIA	✅ when `h2` installed
Client-disconnect cancellation	✅
Prompt caching cost savings	❌ not available on hosted endpoint
Anthropic server tools (`web_search_`, `computer_`, MCP)	❌ no NVIDIA equivalent

Troubleshooting

Run nim doctor first — it checks everything in one go.

"Long pause before first token" Fixed by eager message_start. If still slow, NVIDIA's TTFT for Nemotron Ultra 253B is 3–8 s by design. Switch to Nemotron Super 49B v1.5 for snappier responses.

404 on claude-haiku-4-5 or similar Use nim code instead of setting env vars manually — it sets all four ANTHROPIC_DEFAULT_*_MODEL vars correctly.

400 maximum context length The proxy clamps max_tokens with a safety margin and retries once when NVIDIA reports an exact tokenizer limit. If you still hit this with very large Claude Code sessions, lower the completion budget:

export MAX_OUTPUT_TOKENS=8192
export CONTEXT_SAFETY_MARGIN=4096

429 rate_limit_error Free tier is 40 RPM per key. Back off or upgrade to NVIDIA AI Enterprise.

401 authentication_error from upstream Your NVIDIA_API_KEY is wrong or expired. Generate a new one at build.nvidia.com.

Port already in use

nim configure server.port 8788
nim restart

Manual / Development Setup

git clone https://github.com/khiwniti/nvd-nim-proxy
cd nvd-nim-proxy
pip install -r requirements.txt

cp .env.example .env
# edit .env — paste NVIDIA_API_KEY

python3 nim_code.py code   # or: python3 proxy.py

Run tests:

python3 -m pytest -v          # offline tests, no live API needed
python3 -m pytest --cov=proxy --cov-report=term-missing

Test with curl (no Claude Code needed):

# Non-streaming
curl -s http://127.0.0.1:8787/v1/messages \
  -H "content-type: application/json" \
  -d '{"model":"nvidia/llama-3.3-nemotron-super-49b-v1.5","max_tokens":64,
       "messages":[{"role":"user","content":"Say hi in five words."}]}' \
  | python3 -m json.tool

# Streaming — message_start should arrive in < 100 ms
curl -sN http://127.0.0.1:8787/v1/messages \
  -H "content-type: application/json" \
  -d '{"model":"nvidia/llama-3.3-nemotron-super-49b-v1.5","max_tokens":128,
       "stream":true,"messages":[{"role":"user","content":"Count to ten."}]}'

Repository Layout

proxy.py              Anthropic → NVIDIA translation proxy (FastAPI)
nim_code.py           Production CLI — daemon, doctor, configure, etc.
config.example.yaml   Non-secret config with model aliases
.env.example          Environment variable template
requirements.txt      Runtime + test dependencies
pyproject.toml        Package metadata and build config
tests/
  conftest.py         Test env setup
  test_translation.py Request/response/error translation unit tests
  test_streaming.py   SSE event ordering and StreamTranslator tests
  test_stream_eager.py Eager message_start async test
  test_routes.py      Route smoke tests
  test_e2e.py         End-to-end tests with mocked NVIDIA API
specs/                Spec Kit — requirements, design, tasks

License

MIT — see LICENSE.

Built for developers who want Claude Code's full power on NVIDIA's free hosted models.

Get your free NVIDIA API key →

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.8

May 24, 2026

0.2.6

May 16, 2026

0.2.5

May 16, 2026

0.2.4

May 16, 2026

0.2.3

May 16, 2026

This version

0.2.2

May 16, 2026

0.2.1

May 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nim_claude_proxy-0.2.2.tar.gz (34.8 kB view details)

Uploaded May 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nim_claude_proxy-0.2.2-py3-none-any.whl (51.7 kB view details)

Uploaded May 16, 2026 Python 3

File details

Details for the file nim_claude_proxy-0.2.2.tar.gz.

File metadata

Download URL: nim_claude_proxy-0.2.2.tar.gz
Upload date: May 16, 2026
Size: 34.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for nim_claude_proxy-0.2.2.tar.gz
Algorithm	Hash digest
SHA256	`9e26d89db4b8d899f0b38c73e00c062ad705bd7376e53dc06175a639a865e000`
MD5	`da37532cc91cf28043f60a1947c58817`
BLAKE2b-256	`547b08ca8e4cf36c24b85147894db89bff9b583c8af1ddb0dd9fad4f48138456`

See more details on using hashes here.

File details

Details for the file nim_claude_proxy-0.2.2-py3-none-any.whl.

File metadata

Download URL: nim_claude_proxy-0.2.2-py3-none-any.whl
Upload date: May 16, 2026
Size: 51.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for nim_claude_proxy-0.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5f74874d3ee65eb37865278c2c75b3d4660aaf5d958ce40f848e91612faa8643`
MD5	`a84459cb2d0328f95f0c6d3d52c1a896`
BLAKE2b-256	`4021a89e50b8129e50a1c9d6cecb227cf8145993ef3c7b4e76d128d314a7631c`

See more details on using hashes here.

nim-claude-proxy 0.2.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

nvd-nim-proxy

Why this exists

Quickstart — 2 minutes

Deploy on Cloudflare

⚡ Instant Model Switching

CLI Reference

Recommended Models

Configuration

What's translated

Troubleshooting

Manual / Development Setup

Repository Layout

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes