Skip to main content

Anthropic Messages → NVIDIA NIM Proxy for Claude Code

Project description

nvd-nim-proxy

Run Claude Code on NVIDIA's free hosted AI catalog — no Anthropic subscription needed.

Python License PyPI Tests

Deploy to Cloudflare

Claude Code ──/v1/messages──► nvd-nim-proxy ──/v1/chat/completions──► integrate.api.nvidia.com
  (Anthropic SSE protocol)     (translation)      (OpenAI SSE protocol)        (NVIDIA NIM)

One command. Free API key. Full Claude Code experience backed by Nemotron.


Why this exists

integrate.api.nvidia.com speaks OpenAI Chat Completions. Claude Code speaks Anthropic Messages. This proxy sits between them and translates everything — streaming SSE events, tool calls, vision, reasoning blocks, error envelopes — so Claude Code never knows the difference.

Note: If you can run a NIM container yourself (single H100 or L40S), you don't need this proxy — see NVIDIA's official Claude Code integration guide. This is for the free hosted catalog at build.nvidia.com.


Quickstart — 2 minutes

# 1. Install
pip install nim-claude-proxy

# 2. Configure (guided wizard)
nim init
#  🔑 Enter NVIDIA_API_KEY (get one free at https://build.nvidia.com)
#  🔌 Proxy port [8787]

# 3. Start the proxy daemon
nim start
#  ● Proxy started  PID 12345  http://127.0.0.1:8787
#
#  ┌─ Claude Code env vars ──────────────────────────────┐
#  │  export ANTHROPIC_BASE_URL=http://127.0.0.1:8787    │
#  │  export ANTHROPIC_API_KEY=not-used                  │
#  └─────────────────────────────────────────────────────┘

# 4. Launch Claude Code (proxy keeps running between sessions)
nim code

Or skip the daemon and just use the one-liner:

NVIDIA_API_KEY=nvapi-... nim code

Deploy on Cloudflare

This repository includes a Cloudflare Workers + Containers configuration at the repository root (wrangler.toml) and a Worker entrypoint in worker/src/index.ts. The Worker runs the Python FastAPI proxy inside a Cloudflare Container and forwards /v1/messages, /v1/models, and /v1/messages/count_tokens to it.

One-click: use the Deploy to Cloudflare button above, then set the required secret in the created Worker project:

npx wrangler secret put NVIDIA_API_KEY
npx wrangler secret put PROXY_API_KEY   # strongly recommended for public URLs

Manual deploy:

npm install
npx wrangler secret put NVIDIA_API_KEY
npx wrangler secret put PROXY_API_KEY   # optional locally, recommended publicly
npm run deploy

Then point Claude Code at your Worker URL:

export ANTHROPIC_BASE_URL=https://your-worker.your-subdomain.workers.dev
export ANTHROPIC_API_KEY=$PROXY_API_KEY
export CLAUDE_CODE_ENABLE_GATEWAY_MODEL_DISCOVERY=1
export CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS=1
export ENABLE_TOOL_SEARCH=false
export CLAUDE_CODE_DISABLE_THINKING=1
export DISABLE_INTERLEAVED_THINKING=1
claude

Production notes:

  • PROXY_API_KEY protects your public Worker URL from becoming an open NVIDIA API relay.
  • Cloudflare builds and pushes the container image from Dockerfile during wrangler deploy; Docker must be available for manual local deploys.
  • The edge Worker returns /healthz and /health without waking the container and rejects unauthenticated /v1/* traffic before container startup when PROXY_API_KEY is configured.
  • Optional secrets/vars: DEFAULT_NVIDIA_MODEL, MAX_OUTPUT_TOKENS, CONTEXT_SAFETY_MARGIN, LOG_LEVEL.

⚡ Instant Model Switching

Switch to any model on NVIDIA's catalog in one command — no config file editing, no restart required:

# Switch your default model permanently
nim use qwen/qwen3-235b-a22b
nim use z-ai/glm-5.1
nim use meta/llama-4-maverick-17b-128e-instruct
nim use nvidia/llama-3.1-nemotron-ultra-253b-v1

# One-session override (default unchanged)
nim code --model qwen/qwen3-235b-a22b
nim test --model z-ai/glm-5.1

# Test any model immediately
nim test --model meta/llama-3.3-70b-instruct "Write a haiku about GPUs"

nim use saves the model to ~/.config/nim-proxy/config.yaml and restarts the proxy automatically if it's running. Any model ID from build.nvidia.com works — no aliases, no mapping needed.

How it works: The proxy passes any provider/model ID straight to NVIDIA unchanged. Only claude-* names get remapped to your configured NVIDIA model. Everything else is zero-friction passthrough.


CLI Reference

Command Description
nim init Interactive setup wizard — saves config to ~/.config/nim-proxy/
nim start Start proxy as background daemon
nim stop Stop the daemon
nim restart Restart daemon
nim status Show PID, URL, model, API key, health
nim logs [-f] [-n N] View proxy logs; -f tails live
nim code [--model ID] Start daemon if needed, then launch Claude Code
nim doctor Diagnose: Python, key, NVIDIA API, port, health, Claude install
nim configure <key> <val> Set a config value (server.port, nvidia.default_model, …)
nim configure --list Print effective config (secrets redacted)
nim use <model> Switch model instantly — saves config + restarts daemon
nim models List available NVIDIA NIM models
nim test [prompt] Send a one-shot test request and show the result
nim proxy Start proxy in foreground (debugging)
nim version Print version

Recommended Models

Model Best for
nvidia/llama-3.3-nemotron-super-49b-v1.5 Default. Best reasoning + tools balance
nvidia/llama-3.1-nemotron-ultra-253b-v1 Strongest reasoning — slower TTFT
nvidia/nvidia-nemotron-nano-9b-v2 Fast responses; good for sub-agent (HAIKU_MODEL)
meta/llama-3.3-70b-instruct General purpose, no reasoning overhead
qwen/qwen3-235b-a22b Strong coder, MoE architecture
meta/llama-4-maverick-17b-128e-instruct Vision + tools

⚠️ Avoid deepseek-ai/deepseek-r1 — its tool-calling and reasoning paths are mutually exclusive on the hosted endpoint.


Configuration

Config is stored at ~/.config/nim-proxy/config.yaml and can be edited directly or via nim configure:

nim configure server.port 9000
nim configure nvidia.default_model nvidia/llama-3.1-nemotron-ultra-253b-v1
nim configure --list   # print all settings (key redacted)

Environment variables override YAML and are never written to disk:

export NVIDIA_API_KEY=nvapi-...           # required
export DEFAULT_NVIDIA_MODEL=nvidia/...    # override default model
export PROXY_HOST=127.0.0.1
export PROXY_PORT=8787
export PROXY_API_KEY=secret              # optional: require x-api-key from clients
export LOG_LEVEL=info

# Claude Code gateway compatibility knobs used by `nim code`
export CLAUDE_CODE_ENABLE_GATEWAY_MODEL_DISCOVERY=1
export CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS=1
export ENABLE_TOOL_SEARCH=false
export CLAUDE_CODE_DISABLE_THINKING=1
export DISABLE_INTERLEAVED_THINKING=1

Model aliases in config.example.yaml map Claude Code model names to NVIDIA models automatically — no need to set ANTHROPIC_DEFAULT_*_MODEL manually when using nim code.


What's translated

Feature Status
Streaming /v1/messages ✅ Full SSE event sequence
Non-streaming /v1/messages
Tool calling (single + parallel) tool_usetool_calls
tool_result round-trip
System prompts (string + block array)
Vision (base64 + URL)
Reasoning (reasoning_content + <think> tags) ✅ response-side conversion; Anthropic-only thinking requests are disabled by nim code
Token counting (/v1/messages/count_tokens) ✅ heuristic ±15%
Model listing (/v1/models) ✅ proxied; claude-* aliases support Claude Code gateway discovery
Eager message_start (sub-100 ms TTFT)
15 s ping heartbeat during reasoning ✅ keeps TUI alive
Context-window overflow retry ✅ clamps output and retries once on NVIDIA tokenizer errors
HTTP/2 to NVIDIA ✅ when h2 installed
Client-disconnect cancellation
Prompt caching cost savings ❌ not available on hosted endpoint
Anthropic server tools (web_search_*, computer_*, MCP) ❌ no NVIDIA equivalent

Troubleshooting

Run nim doctor first — it checks everything in one go.

"Long pause before first token" Fixed by eager message_start. If still slow, NVIDIA's TTFT for Nemotron Ultra 253B is 3–8 s by design. Switch to Nemotron Super 49B v1.5 for snappier responses.

404 on claude-haiku-4-5 or similar Use nim code instead of setting env vars manually — it sets all four ANTHROPIC_DEFAULT_*_MODEL vars correctly.

400 maximum context length The proxy clamps max_tokens with a safety margin and retries once when NVIDIA reports an exact tokenizer limit. If you still hit this with very large Claude Code sessions, lower the completion budget:

export MAX_OUTPUT_TOKENS=8192
export CONTEXT_SAFETY_MARGIN=4096

429 rate_limit_error Free tier is 40 RPM per key. Back off or upgrade to NVIDIA AI Enterprise.

401 authentication_error from upstream Your NVIDIA_API_KEY is wrong or expired. Generate a new one at build.nvidia.com.

Port already in use

nim configure server.port 8788
nim restart

Manual / Development Setup

git clone https://github.com/khiwniti/nvd-nim-proxy
cd nvd-nim-proxy
pip install -r requirements.txt

cp .env.example .env
# edit .env — paste NVIDIA_API_KEY

python3 nim_code.py code   # or: python3 proxy.py

Run tests:

python3 -m pytest -v          # offline tests, no live API needed
python3 -m pytest --cov=proxy --cov-report=term-missing

Test with curl (no Claude Code needed):

# Non-streaming
curl -s http://127.0.0.1:8787/v1/messages \
  -H "content-type: application/json" \
  -d '{"model":"nvidia/llama-3.3-nemotron-super-49b-v1.5","max_tokens":64,
       "messages":[{"role":"user","content":"Say hi in five words."}]}' \
  | python3 -m json.tool

# Streaming — message_start should arrive in < 100 ms
curl -sN http://127.0.0.1:8787/v1/messages \
  -H "content-type: application/json" \
  -d '{"model":"nvidia/llama-3.3-nemotron-super-49b-v1.5","max_tokens":128,
       "stream":true,"messages":[{"role":"user","content":"Count to ten."}]}'

Repository Layout

proxy.py              Anthropic → NVIDIA translation proxy (FastAPI)
nim_code.py           Production CLI — daemon, doctor, configure, etc.
config.example.yaml   Non-secret config with model aliases
.env.example          Environment variable template
requirements.txt      Runtime + test dependencies
pyproject.toml        Package metadata and build config
tests/
  conftest.py         Test env setup
  test_translation.py Request/response/error translation unit tests
  test_streaming.py   SSE event ordering and StreamTranslator tests
  test_stream_eager.py Eager message_start async test
  test_routes.py      Route smoke tests
  test_e2e.py         End-to-end tests with mocked NVIDIA API
specs/                Spec Kit — requirements, design, tasks

License

MIT — see LICENSE.


Built for developers who want Claude Code's full power on NVIDIA's free hosted models.

Get your free NVIDIA API key →

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nim_claude_proxy-0.2.2.tar.gz (34.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nim_claude_proxy-0.2.2-py3-none-any.whl (51.7 kB view details)

Uploaded Python 3

File details

Details for the file nim_claude_proxy-0.2.2.tar.gz.

File metadata

  • Download URL: nim_claude_proxy-0.2.2.tar.gz
  • Upload date:
  • Size: 34.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for nim_claude_proxy-0.2.2.tar.gz
Algorithm Hash digest
SHA256 9e26d89db4b8d899f0b38c73e00c062ad705bd7376e53dc06175a639a865e000
MD5 da37532cc91cf28043f60a1947c58817
BLAKE2b-256 547b08ca8e4cf36c24b85147894db89bff9b583c8af1ddb0dd9fad4f48138456

See more details on using hashes here.

File details

Details for the file nim_claude_proxy-0.2.2-py3-none-any.whl.

File metadata

File hashes

Hashes for nim_claude_proxy-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 5f74874d3ee65eb37865278c2c75b3d4660aaf5d958ce40f848e91612faa8643
MD5 a84459cb2d0328f95f0c6d3d52c1a896
BLAKE2b-256 4021a89e50b8129e50a1c9d6cecb227cf8145993ef3c7b4e76d128d314a7631c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page