Skip to main content

LiteLLM adapter for BlockRun — call x402-paid AI models via LiteLLM (custom provider or local OpenAI-compatible proxy)

Project description

blockrun-litellm

PyPI Python License

LiteLLM adapter for BlockRun — call x402-paid AI models through LiteLLM with zero changes to your existing code.

TL;DR — BlockRun's /v1/chat/completions is already OpenAI-compatible at the protocol level. The only thing that differs is authentication: BlockRun uses per-request x402 wallet signatures (non-custodial USDC micropayments on Base / Solana), not a Bearer API key. This package bridges that gap.

中文文档见底部 / Chinese docs at the bottom


Two ways to integrate

Mode Best for What it looks like
1. Custom provider (in-process) Apps using the LiteLLM Python library litellm.completion(model="blockrun/openai/gpt-5.5", ...)
2. Local proxy (sidecar) Apps using the LiteLLM Proxy Server (or any OpenAI client) api_base="http://localhost:4001/v1"

Both modes share the same underlying wallet/signing flow (via the blockrun-llm SDK), so they behave identically. Pick whichever fits your deployment.

Verified end-to-end against the live BlockRun gateway

Both modes have been validated against https://blockrun.ai/api using the free nvidia/deepseek-v4-flash model:

$ python -c "
> import litellm
> from blockrun_litellm import register; register()
> r = litellm.completion(
>     model='blockrun/nvidia/deepseek-v4-flash',
>     messages=[{'role':'user','content':'Reply with exactly: pong'}],
>     max_tokens=20, temperature=0.0)
> print(r.choices[0].message.content)"
pong

$ curl -sS http://127.0.0.1:4001/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{"model":"nvidia/deepseek-v4-flash","messages":[{"role":"user","content":"Reply with exactly: proxy-ok"}]}'
{"id":"a710c144c68c42f7a319fb93e9b9b5a0","object":"chat.completion","model":"nvidia/deepseek-v4-flash",
 "choices":[{"index":0,"message":{"role":"assistant","content":"proxy-ok"},...}],"usage":{...}}

Install

# Custom provider only (no proxy server)
pip install blockrun-litellm

# Custom provider + local proxy (includes FastAPI/uvicorn)
pip install 'blockrun-litellm[proxy]'

Requires Python ≥ 3.9.


Configure your wallet (one-time)

The blockrun-llm SDK signs each request locally with an EVM (Base chain) private key. The key never leaves your machine. Three ways to provide it:

# Option A — environment variable (recommended for servers)
export BLOCKRUN_WALLET_KEY=0xYOUR_BASE_CHAIN_PRIVATE_KEY

# Option B — auto-create + fund a new wallet (interactive, shows QR for funding)
python -c "from blockrun_llm import setup_agent_wallet; setup_agent_wallet()"

# Option C — pass per-call (Python lib mode), see examples below

💡 To validate without spending real USDC, use a free model like nvidia/deepseek-v4-flash — same code path, same wallet flow, $0 settlement.


Mode 1 — Custom provider (Python library)

The shortest path if your app already calls litellm.completion() directly.

1a. Register once at startup

import litellm
from blockrun_litellm import register

register()  # idempotent; adds "blockrun" to litellm.custom_provider_map

1b. Call with a blockrun/ model prefix

response = litellm.completion(
    model="blockrun/openai/gpt-5.5",        # blockrun/<provider>/<model>
    messages=[{"role": "user", "content": "What is the capital of France?"}],
    max_tokens=128,
    temperature=0.7,
)

print(response.choices[0].message.content)
print(response.usage)  # prompt_tokens / completion_tokens / total_tokens

The blockrun/ prefix is stripped before being sent to the BlockRun gateway, so openai/gpt-5.5, anthropic/claude-opus-4-5, google/gemini-3-pro, etc. all work — anything in BlockRun's catalog.

1c. Override the wallet per-call (optional)

response = litellm.completion(
    model="blockrun/openai/gpt-5.5",
    messages=[...],
    api_key="0xANOTHER_PRIVATE_KEY",          # passed to blockrun-llm as wallet
)

1d. Async

import asyncio

async def main():
    response = await litellm.acompletion(
        model="blockrun/openai/gpt-5.5",
        messages=[{"role": "user", "content": "Hi"}],
    )
    print(response.choices[0].message.content)

asyncio.run(main())

Mode 2 — Local proxy (LiteLLM Proxy Server, langchain, raw curl, …)

If you're running the LiteLLM Proxy Server (litellm --config config.yaml), or any client that just speaks OpenAI HTTP, run our proxy as a sidecar.

2a. Start the proxy

export BLOCKRUN_WALLET_KEY=0xYOUR_KEY
blockrun-litellm-proxy --port 4001
# → uvicorn running at http://127.0.0.1:4001

Flags:

Flag Default Purpose
--host 127.0.0.1 Bind interface. Keep loopback unless you set BLOCKRUN_PROXY_TOKEN.
--port 4001 Bind port
--api-url https://blockrun.ai/api Override BlockRun gateway endpoint
--log-level info critical/error/warning/info/debug/trace

Optional shared-secret guard:

export BLOCKRUN_PROXY_TOKEN=$(openssl rand -hex 32)
# clients must now send:  Authorization: Bearer $BLOCKRUN_PROXY_TOKEN

2b. Point LiteLLM Proxy at it

Drop this into your config.yaml:

model_list:
  - model_name: gpt-5.5
    litellm_params:
      model: openai/openai/gpt-5.5   # first 'openai/' = LiteLLM provider; rest = BlockRun model id
      api_base: http://localhost:4001/v1
      api_key: "dummy"                # ignored if BLOCKRUN_PROXY_TOKEN is unset

  - model_name: claude-opus-4-5
    litellm_params:
      model: openai/anthropic/claude-opus-4-5
      api_base: http://localhost:4001/v1
      api_key: "dummy"

litellm_settings:
  drop_params: True   # silently drop OpenAI params BlockRun doesn't support

Run LiteLLM Proxy as usual:

litellm --config config.yaml --port 4000

Then call it like any OpenAI endpoint:

curl http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.5",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

2c. Or skip LiteLLM entirely

The proxy speaks OpenAI HTTP, so anything that takes an api_base works:

# OpenAI Python SDK pointed straight at the BlockRun proxy
from openai import OpenAI

client = OpenAI(api_key="dummy", base_url="http://localhost:4001/v1")
resp = client.chat.completions.create(
    model="openai/gpt-5.5",
    messages=[{"role": "user", "content": "Hi"}],
)
print(resp.choices[0].message.content)
# Plain curl
curl http://localhost:4001/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "openai/gpt-5.5", "messages": [{"role":"user","content":"Hi"}]}'

2d. Endpoints exposed

Method Path Notes
POST /v1/chat/completions OpenAI Chat Completions. stream=True returns text/event-stream; otherwise JSON.
GET /v1/models BlockRun model catalog
GET /healthz Liveness probe (no upstream call)
GET /docs Auto-generated Swagger UI

Supported parameters

All of these are forwarded to BlockRun unchanged:

OpenAI param Supported Notes
model Any BlockRun model id, e.g. openai/gpt-5.5
messages Full role/content/tool_calls schema
max_tokens Defaults to 1024 if omitted
temperature 0–2
top_p
tools / tool_choice Function calling
stream OpenAI-style SSE (text/event-stream). Provider mode yields LiteLLM GenericStreamingChunk objects; proxy mode emits data: <json>\n\n events terminated by data: [DONE]. Free models stream directly; paid models stream after the in-band 402-sign-retry dance.
frequency_penalty / presence_penalty / logprobs / n ⚠️ Silently dropped — enable litellm_settings.drop_params: True to suppress LiteLLM warnings

BlockRun-specific extras (also accepted):

Param Purpose
search: True Enable xAI Live Search (for search-enabled models)
search_parameters: {...} Full Live Search config
fallback_models: ["..."] Auto-retry on transient upstream errors

Examples

The examples/ directory has copy-paste-ready snippets:


How it works (under the hood)

┌─────────────────┐    OpenAI dict     ┌──────────────────────┐    POST /v1/chat/completions  ┌────────────────┐
│ Your app /      │ ─────────────────▶ │  blockrun-litellm    │ ────────────────────────────▶ │  blockrun.ai   │
│ LiteLLM /       │                    │  (provider OR proxy) │ ◀──── 402 + payment-required ─│  gateway       │
│ OpenAI SDK      │                    │  ↓                   │                               │                │
└─────────────────┘                    │  blockrun-llm SDK    │ ───── EIP-712 signed retry ──▶│                │
                                       │  (local signing)     │ ◀──── 200 + chat response ────│                │
                                       └──────────────────────┘                               └────────────────┘
                                                ▲
                                                │ private key (stays local, signs only)
                                       ┌──────────────────────┐
                                       │ BLOCKRUN_WALLET_KEY  │
                                       │   or ~/.blockrun/    │
                                       └──────────────────────┘
  1. Caller sends an OpenAI Chat Completions dict.
  2. blockrun-litellm whitelists the params and dispatches through blockrun-llm.
  3. blockrun-llm posts to BlockRun, receives a 402 with payment requirements, signs an EIP-712 payment locally with your wallet, and retries.
  4. BlockRun verifies the signature on-chain, settles the USDC micropayment, runs the inference, and returns the response.
  5. blockrun-litellm returns the dumped pydantic model as a plain OpenAI dict (or litellm.ModelResponse in provider mode).

FAQ

Q: Does this support streaming? Yes, as of v0.2.0. Pass stream=True and the adapter routes through blockrun-llm's chat_completion_stream() (SDK ≥ 0.20.0). The 402 → sign-locally → retry-with-PAYMENT-SIGNATURE dance happens before the first chunk; once the upstream switches to text/event-stream, chunks are forwarded straight through (provider mode → litellm.GenericStreamingChunk, proxy mode → OpenAI-style data: <json>\n\n SSE). Caveats inherited from the gateway: search_parameters and the Responses-API models (codex, gpt-5.4-pro) reject streaming server-side with 400.

Q: Where does my private key live? On your machine only — BLOCKRUN_WALLET_KEY env var, or ~/.blockrun/.session if you used setup_agent_wallet(). The proxy and provider both read from those sources via blockrun-llm. Only EIP-712 signatures are transmitted.

Q: How do I switch between Base and Solana? Today this adapter wires to BlockRun's Base gateway (USDC on Base). Solana support tracks the blockrun-llm SolanaLLMClient and will be added in a follow-up release.

Q: Can I run the proxy in Docker / k8s? Yes — it's a vanilla FastAPI app. Pass the wallet key via secret (env var), bind to 0.0.0.0 only inside a private network, and set BLOCKRUN_PROXY_TOKEN for an additional auth layer.

Q: Is this affiliated with LiteLLM (BerriAI)? No — this is an independent adapter built by the BlockRun team. LiteLLM is a great project; we're just plugging into its custom-provider hooks.


Development

git clone https://github.com/BlockRunAI/blockrun-litellm
cd blockrun-litellm
pip install -e '.[proxy,dev]'
pytest

License

MIT. See LICENSE.


中文文档

BlockRunLiteLLM 适配层 —— 用 LiteLLM 调用 BlockRun 上的 AI 模型,完全零改动

一句话: BlockRun 的 /v1/chat/completions 协议层就是 OpenAI 兼容的,唯一区别是认证方式 —— BlockRun 用 x402 钱包签名(按次 USDC 微支付,非托管),不是 Bearer API Key。这个包就是把这层差异填平。

两种对接方式

模式 适用 写法
1. 自定义 Provider(进程内) 用 LiteLLM Python 库的应用 litellm.completion(model="blockrun/openai/gpt-5.5", ...)
2. 本地代理(sidecar) 用 LiteLLM Proxy Server 的、或任何 OpenAI 客户端 api_base="http://localhost:4001/v1"

底层都走 blockrun-llm SDK 做签名和 x402 支付,两种模式行为一致。按你的部署方式选一种就行。

快速上手

安装

# 只装自定义 provider
pip install blockrun-litellm

# 同时装本地代理(带 FastAPI/uvicorn)
pip install 'blockrun-litellm[proxy]'

配钱包(一次性)

# 方式 A — 环境变量(服务端推荐)
export BLOCKRUN_WALLET_KEY=0xYOUR_BASE_CHAIN_PRIVATE_KEY

# 方式 B — 自动创建并扫码充值(交互式)
python -c "from blockrun_llm import setup_agent_wallet; setup_agent_wallet()"

私钥只在本地用于 EIP-712 签名,永远不会离开你的机器。

💡 想零成本试一遍?用免费模型 nvidia/deepseek-v4-flash —— 代码完全一样,钱包流程一样,结算 $0。

模式 1:自定义 Provider

import litellm
from blockrun_litellm import register

register()  # 启动时调一次即可

response = litellm.completion(
    model="blockrun/openai/gpt-5.5",   # blockrun/<provider>/<model>
    messages=[{"role": "user", "content": "你好"}],
    max_tokens=128,
)
print(response.choices[0].message.content)

异步版本:await litellm.acompletion(...) 同理。

模式 2:本地代理

# 1) 启动 sidecar
export BLOCKRUN_WALLET_KEY=0xYOUR_KEY
blockrun-litellm-proxy --port 4001

# 2) LiteLLM Proxy 配置 (config.yaml)
model_list:
  - model_name: gpt-5.5
    litellm_params:
      model: openai/openai/gpt-5.5
      api_base: http://localhost:4001/v1
      api_key: "dummy"

litellm_settings:
  drop_params: True

或者直接拿任何 OpenAI 客户端用:

from openai import OpenAI
client = OpenAI(api_key="dummy", base_url="http://localhost:4001/v1")
resp = client.chat.completions.create(
    model="openai/gpt-5.5",
    messages=[{"role": "user", "content": "你好"}],
)

支持的参数

OpenAI 参数 支持 备注
model / messages / max_tokens / temperature / top_p
tools / tool_choice 函数调用
stream OpenAI 标准 SSE(text/event-stream)。Provider 模式 yield LiteLLM GenericStreamingChunk;Proxy 模式发 data: <json>\n\n 事件并以 data: [DONE] 结尾。免费模型直接开流;付费模型走带内 402→签名→重试再开流。
frequency_penalty / presence_penalty / logprobs / n ⚠️ 静默丢弃 —— 建议 LiteLLM 配 drop_params: True 抑制告警

BlockRun 额外参数:

参数 作用
search: True 启用 xAI Live Search(搜索类模型)
search_parameters: {...} 完整 Live Search 配置
fallback_models: ["..."] 上游抖动自动重试列表

常见问题

Q:支持流式吗? v0.2.0 起完全支持。stream=True 时适配层走 blockrun-llmchat_completion_stream()(SDK ≥ 0.20.0),402 → 本地签名 → 带 PAYMENT-SIGNATURE 重试这条链在第一个 chunk 之前完成;上游切到 text/event-stream 后 chunks 直接透传(Provider 模式 → litellm.GenericStreamingChunk,Proxy 模式 → OpenAI 标准 data: <json>\n\n)。后端继承的限制:search_parameters 和 Responses-API 模型(codexgpt-5.4-pro)在服务端就拒绝流式(400)。

Q:私钥放哪? 只在本地 —— BLOCKRUN_WALLET_KEY 环境变量,或 setup_agent_wallet() 创建的 ~/.blockrun/.session。Provider 和 Proxy 都通过 blockrun-llm 读取。链上只看到签名,看不到私钥。

Q:Docker / k8s 部署? 代理是普通的 FastAPI 应用。密钥用 secret 注入,对外只暴露内网,可选 BLOCKRUN_PROXY_TOKEN 加一层 Bearer 鉴权。

Q:和 BerriAI 是什么关系? 没关系。这是 BlockRun 团队独立维护的适配层,挂在 LiteLLM 的 custom provider 接口上。

开发

git clone https://github.com/BlockRunAI/blockrun-litellm
cd blockrun-litellm
pip install -e '.[proxy,dev]'
pytest

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

blockrun_litellm-0.2.2.tar.gz (28.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

blockrun_litellm-0.2.2-py3-none-any.whl (22.4 kB view details)

Uploaded Python 3

File details

Details for the file blockrun_litellm-0.2.2.tar.gz.

File metadata

  • Download URL: blockrun_litellm-0.2.2.tar.gz
  • Upload date:
  • Size: 28.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for blockrun_litellm-0.2.2.tar.gz
Algorithm Hash digest
SHA256 c4a4313a5395a027a5a4fbceedc21ac0f969952db9342ff32668b1857fd27273
MD5 976c4f119cd3e8fb770beea49ed232e1
BLAKE2b-256 fe13c224d3033b4dca191e4182ac1cc20f4dcb754fa16cbcb6be9cd042a5478d

See more details on using hashes here.

File details

Details for the file blockrun_litellm-0.2.2-py3-none-any.whl.

File metadata

File hashes

Hashes for blockrun_litellm-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e1be9695352e5cbe682393550766b4ffbb06f3e5b350eacdb9d51ba435b14fb0
MD5 510cf8f62527cdb8703498172b0bcdbf
BLAKE2b-256 c17987ace7c4212edf31d1ebf115d56f21e5f7c0fedf3e8fbb5520b288df2531

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page