LiteLLM adapter for BlockRun — call x402-paid AI models via LiteLLM (custom provider or local OpenAI-compatible proxy)

These details have not been verified by PyPI

Project links

Project description

blockrun-litellm

LiteLLM adapter for BlockRun — call x402-paid AI models through LiteLLM with zero changes to your existing code.

TL;DR — BlockRun's /v1/chat/completions is already OpenAI-compatible at the protocol level. The only thing that differs is authentication: BlockRun uses per-request x402 wallet signatures (non-custodial USDC micropayments on Base / Solana), not a Bearer API key. This package bridges that gap.

中文文档见底部 / Chinese docs at the bottom

Two ways to integrate

Mode	Best for	What it looks like
1. Custom provider (in-process)	Apps using the LiteLLM Python library	`litellm.completion(model="blockrun/openai/gpt-5.5", ...)`
2. Local proxy (sidecar)	Apps using the LiteLLM Proxy Server (or any OpenAI client)	`api_base="http://localhost:4001/v1"`

Both modes share the same underlying wallet/signing flow (via the blockrun-llm SDK), so they behave identically. Pick whichever fits your deployment.

Verified end-to-end against the live BlockRun gateway

Both modes have been validated against https://blockrun.ai/api using the free nvidia/deepseek-v4-flash model:

$ python -c "
> import litellm
> from blockrun_litellm import register; register()
> r = litellm.completion(
>     model='blockrun/nvidia/deepseek-v4-flash',
>     messages=[{'role':'user','content':'Reply with exactly: pong'}],
>     max_tokens=20, temperature=0.0)
> print(r.choices[0].message.content)"
pong

$ curl -sS http://127.0.0.1:4001/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{"model":"nvidia/deepseek-v4-flash","messages":[{"role":"user","content":"Reply with exactly: proxy-ok"}]}'
{"id":"a710c144c68c42f7a319fb93e9b9b5a0","object":"chat.completion","model":"nvidia/deepseek-v4-flash",
 "choices":[{"index":0,"message":{"role":"assistant","content":"proxy-ok"},...}],"usage":{...}}

Install

# Custom provider only (no proxy server)
pip install blockrun-litellm

# Custom provider + local proxy (includes FastAPI/uvicorn)
pip install 'blockrun-litellm[proxy]'

Requires Python ≥ 3.9.

Configure your wallet (one-time)

The blockrun-llm SDK signs each request locally with an EVM (Base chain) private key. The key never leaves your machine. Three ways to provide it:

# Option A — environment variable (recommended for servers)
export BLOCKRUN_WALLET_KEY=0xYOUR_BASE_CHAIN_PRIVATE_KEY

# Option B — auto-create + fund a new wallet (interactive, shows QR for funding)
python -c "from blockrun_llm import setup_agent_wallet; setup_agent_wallet()"

# Option C — pass per-call (Python lib mode), see examples below

💡 To validate without spending real USDC, use a free model like nvidia/deepseek-v4-flash — same code path, same wallet flow, $0 settlement.

Mode 1 — Custom provider (Python library)

The shortest path if your app already calls litellm.completion() directly.

1a. Register once at startup

import litellm
from blockrun_litellm import register

register()  # idempotent; adds "blockrun" to litellm.custom_provider_map

1b. Call with a `blockrun/` model prefix

response = litellm.completion(
    model="blockrun/openai/gpt-5.5",        # blockrun/<provider>/<model>
    messages=[{"role": "user", "content": "What is the capital of France?"}],
    max_tokens=128,
    temperature=0.7,
)

print(response.choices[0].message.content)
print(response.usage)  # prompt_tokens / completion_tokens / total_tokens

The blockrun/ prefix is stripped before being sent to the BlockRun gateway, so openai/gpt-5.5, anthropic/claude-opus-4-5, google/gemini-3-pro, etc. all work — anything in BlockRun's catalog.

1c. Override the wallet per-call (optional)

response = litellm.completion(
    model="blockrun/openai/gpt-5.5",
    messages=[...],
    api_key="0xANOTHER_PRIVATE_KEY",          # passed to blockrun-llm as wallet
)

1d. Async

import asyncio

async def main():
    response = await litellm.acompletion(
        model="blockrun/openai/gpt-5.5",
        messages=[{"role": "user", "content": "Hi"}],
    )
    print(response.choices[0].message.content)

asyncio.run(main())

Mode 2 — Local proxy (LiteLLM Proxy Server, langchain, raw curl, …)

If you're running the LiteLLM Proxy Server (litellm --config config.yaml), or any client that just speaks OpenAI HTTP, run our proxy as a sidecar.

2a. Start the proxy

export BLOCKRUN_WALLET_KEY=0xYOUR_KEY
blockrun-litellm-proxy --port 4001
# → uvicorn running at http://127.0.0.1:4001

Flags:

Flag	Default	Purpose
`--host`	`127.0.0.1`	Bind interface. Keep loopback unless you set `BLOCKRUN_PROXY_TOKEN`.
`--port`	`4001`	Bind port
`--api-url`	`https://blockrun.ai/api`	Override BlockRun gateway endpoint
`--log-level`	`info`	`critical`/`error`/`warning`/`info`/`debug`/`trace`

Optional shared-secret guard:

export BLOCKRUN_PROXY_TOKEN=$(openssl rand -hex 32)
# clients must now send:  Authorization: Bearer $BLOCKRUN_PROXY_TOKEN

2b. Point LiteLLM Proxy at it

Drop this into your config.yaml:

model_list:
  - model_name: gpt-5.5
    litellm_params:
      model: openai/openai/gpt-5.5   # first 'openai/' = LiteLLM provider; rest = BlockRun model id
      api_base: http://localhost:4001/v1
      api_key: "dummy"                # ignored if BLOCKRUN_PROXY_TOKEN is unset

  - model_name: claude-opus-4-5
    litellm_params:
      model: openai/anthropic/claude-opus-4-5
      api_base: http://localhost:4001/v1
      api_key: "dummy"

litellm_settings:
  drop_params: True   # silently drop OpenAI params BlockRun doesn't support

Run LiteLLM Proxy as usual:

litellm --config config.yaml --port 4000

Then call it like any OpenAI endpoint:

curl http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.5",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

2c. Or skip LiteLLM entirely

The proxy speaks OpenAI HTTP, so anything that takes an api_base works:

# OpenAI Python SDK pointed straight at the BlockRun proxy
from openai import OpenAI

client = OpenAI(api_key="dummy", base_url="http://localhost:4001/v1")
resp = client.chat.completions.create(
    model="openai/gpt-5.5",
    messages=[{"role": "user", "content": "Hi"}],
)
print(resp.choices[0].message.content)

# Plain curl
curl http://localhost:4001/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "openai/gpt-5.5", "messages": [{"role":"user","content":"Hi"}]}'

2d. Endpoints exposed

Method	Path	Notes
`POST`	`/v1/chat/completions`	OpenAI Chat Completions. `stream=True` returns `text/event-stream`; otherwise JSON.
`GET`	`/v1/models`	BlockRun model catalog
`GET`	`/healthz`	Liveness probe (no upstream call)
`GET`	`/docs`	Auto-generated Swagger UI

Supported parameters

All of these are forwarded to BlockRun unchanged:

OpenAI param	Supported	Notes
`model`	✅	Any BlockRun model id, e.g. `openai/gpt-5.5`
`messages`	✅	Full role/content/tool_calls schema
`max_tokens`	✅	Defaults to 1024 if omitted
`temperature`	✅	0–2
`top_p`	✅
`tools` / `tool_choice`	✅	Function calling
`stream`	✅	OpenAI-style SSE (`text/event-stream`). Provider mode yields LiteLLM `GenericStreamingChunk` objects; proxy mode emits `data: <json>\n\n` events terminated by `data: [DONE]`. Free models stream directly; paid models stream after the in-band 402-sign-retry dance.
`frequency_penalty` / `presence_penalty` / `logprobs` / `n`	⚠️	Silently dropped — enable `litellm_settings.drop_params: True` to suppress LiteLLM warnings

BlockRun-specific extras (also accepted):

Param	Purpose
`search: True`	Enable xAI Live Search (for search-enabled models)
`search_parameters: {...}`	Full Live Search config
`fallback_models: ["..."]`	Auto-retry on transient upstream errors

Examples

The examples/ directory has copy-paste-ready snippets:

examples/python_lib.py — full LiteLLM Python library usage
examples/litellm_config.yaml — LiteLLM Proxy Server config
examples/raw_openai_sdk.py — pointing the OpenAI SDK at the proxy

How it works (under the hood)

┌─────────────────┐    OpenAI dict     ┌──────────────────────┐    POST /v1/chat/completions  ┌────────────────┐
│ Your app /      │ ─────────────────▶ │  blockrun-litellm    │ ────────────────────────────▶ │  blockrun.ai   │
│ LiteLLM /       │                    │  (provider OR proxy) │ ◀──── 402 + payment-required ─│  gateway       │
│ OpenAI SDK      │                    │  ↓                   │                               │                │
└─────────────────┘                    │  blockrun-llm SDK    │ ───── EIP-712 signed retry ──▶│                │
                                       │  (local signing)     │ ◀──── 200 + chat response ────│                │
                                       └──────────────────────┘                               └────────────────┘
                                                ▲
                                                │ private key (stays local, signs only)
                                       ┌──────────────────────┐
                                       │ BLOCKRUN_WALLET_KEY  │
                                       │   or ~/.blockrun/    │
                                       └──────────────────────┘

Caller sends an OpenAI Chat Completions dict.
blockrun-litellm whitelists the params and dispatches through blockrun-llm.
blockrun-llm posts to BlockRun, receives a 402 with payment requirements, signs an EIP-712 payment locally with your wallet, and retries.
BlockRun verifies the signature on-chain, settles the USDC micropayment, runs the inference, and returns the response.
blockrun-litellm returns the dumped pydantic model as a plain OpenAI dict (or litellm.ModelResponse in provider mode).

FAQ

Q: Does this support streaming? Yes, as of v0.2.0. Pass stream=True and the adapter routes through blockrun-llm's chat_completion_stream() (SDK ≥ 0.20.0). The 402 → sign-locally → retry-with-PAYMENT-SIGNATURE dance happens before the first chunk; once the upstream switches to text/event-stream, chunks are forwarded straight through (provider mode → litellm.GenericStreamingChunk, proxy mode → OpenAI-style data: <json>\n\n SSE). Caveats inherited from the gateway: search_parameters and the Responses-API models (codex, gpt-5.4-pro) reject streaming server-side with 400.

Q: Where does my private key live? On your machine only — BLOCKRUN_WALLET_KEY env var, or ~/.blockrun/.session if you used setup_agent_wallet(). The proxy and provider both read from those sources via blockrun-llm. Only EIP-712 signatures are transmitted.

Q: How do I switch between Base and Solana? Today this adapter wires to BlockRun's Base gateway (USDC on Base). Solana support tracks the blockrun-llm SolanaLLMClient and will be added in a follow-up release.

Q: Can I run the proxy in Docker / k8s? Yes — it's a vanilla FastAPI app. Pass the wallet key via secret (env var), bind to 0.0.0.0 only inside a private network, and set BLOCKRUN_PROXY_TOKEN for an additional auth layer.

Q: Is this affiliated with LiteLLM (BerriAI)? No — this is an independent adapter built by the BlockRun team. LiteLLM is a great project; we're just plugging into its custom-provider hooks.

Development

git clone https://github.com/BlockRunAI/blockrun-litellm
cd blockrun-litellm
pip install -e '.[proxy,dev]'
pytest

License

MIT. See LICENSE.

中文文档

BlockRun 的 LiteLLM 适配层 —— 用 LiteLLM 调用 BlockRun 上的 AI 模型，完全零改动。

一句话： BlockRun 的 /v1/chat/completions 协议层就是 OpenAI 兼容的，唯一区别是认证方式 —— BlockRun 用 x402 钱包签名（按次 USDC 微支付，非托管），不是 Bearer API Key。这个包就是把这层差异填平。

两种对接方式

模式	适用	写法
1. 自定义 Provider（进程内）	用 LiteLLM Python 库的应用	`litellm.completion(model="blockrun/openai/gpt-5.5", ...)`
2. 本地代理（sidecar）	用 LiteLLM Proxy Server 的、或任何 OpenAI 客户端	`api_base="http://localhost:4001/v1"`

底层都走 blockrun-llm SDK 做签名和 x402 支付，两种模式行为一致。按你的部署方式选一种就行。

快速上手

安装

# 只装自定义 provider
pip install blockrun-litellm

# 同时装本地代理（带 FastAPI/uvicorn）
pip install 'blockrun-litellm[proxy]'

配钱包（一次性）

# 方式 A — 环境变量（服务端推荐）
export BLOCKRUN_WALLET_KEY=0xYOUR_BASE_CHAIN_PRIVATE_KEY

# 方式 B — 自动创建并扫码充值（交互式）
python -c "from blockrun_llm import setup_agent_wallet; setup_agent_wallet()"

私钥只在本地用于 EIP-712 签名，永远不会离开你的机器。

💡 想零成本试一遍？用免费模型 nvidia/deepseek-v4-flash —— 代码完全一样，钱包流程一样，结算 $0。

模式 1：自定义 Provider

import litellm
from blockrun_litellm import register

register()  # 启动时调一次即可

response = litellm.completion(
    model="blockrun/openai/gpt-5.5",   # blockrun/<provider>/<model>
    messages=[{"role": "user", "content": "你好"}],
    max_tokens=128,
)
print(response.choices[0].message.content)

异步版本：await litellm.acompletion(...) 同理。

模式 2：本地代理

# 1) 启动 sidecar
export BLOCKRUN_WALLET_KEY=0xYOUR_KEY
blockrun-litellm-proxy --port 4001

# 2) LiteLLM Proxy 配置 (config.yaml)

model_list:
  - model_name: gpt-5.5
    litellm_params:
      model: openai/openai/gpt-5.5
      api_base: http://localhost:4001/v1
      api_key: "dummy"

litellm_settings:
  drop_params: True

或者直接拿任何 OpenAI 客户端用：

from openai import OpenAI
client = OpenAI(api_key="dummy", base_url="http://localhost:4001/v1")
resp = client.chat.completions.create(
    model="openai/gpt-5.5",
    messages=[{"role": "user", "content": "你好"}],
)

支持的参数

OpenAI 参数	支持	备注
`model` / `messages` / `max_tokens` / `temperature` / `top_p`	✅
`tools` / `tool_choice`	✅	函数调用
`stream`	✅	OpenAI 标准 SSE（`text/event-stream`）。Provider 模式 yield LiteLLM `GenericStreamingChunk`；Proxy 模式发 `data: <json>\n\n` 事件并以 `data: [DONE]` 结尾。免费模型直接开流；付费模型走带内 402→签名→重试再开流。
`frequency_penalty` / `presence_penalty` / `logprobs` / `n`	⚠️	静默丢弃 —— 建议 LiteLLM 配 `drop_params: True` 抑制告警

BlockRun 额外参数：

参数	作用
`search: True`	启用 xAI Live Search（搜索类模型）
`search_parameters: {...}`	完整 Live Search 配置
`fallback_models: ["..."]`	上游抖动自动重试列表

常见问题

Q：支持流式吗？ v0.2.0 起完全支持。stream=True 时适配层走 blockrun-llm 的 chat_completion_stream()（SDK ≥ 0.20.0），402 → 本地签名 → 带 PAYMENT-SIGNATURE 重试这条链在第一个 chunk 之前完成；上游切到 text/event-stream 后 chunks 直接透传（Provider 模式 → litellm.GenericStreamingChunk，Proxy 模式 → OpenAI 标准 data: <json>\n\n）。后端继承的限制：search_parameters 和 Responses-API 模型（codex、gpt-5.4-pro）在服务端就拒绝流式（400）。

Q：私钥放哪？ 只在本地 —— BLOCKRUN_WALLET_KEY 环境变量，或 setup_agent_wallet() 创建的 ~/.blockrun/.session。Provider 和 Proxy 都通过 blockrun-llm 读取。链上只看到签名，看不到私钥。

Q：Docker / k8s 部署？ 代理是普通的 FastAPI 应用。密钥用 secret 注入，对外只暴露内网，可选 BLOCKRUN_PROXY_TOKEN 加一层 Bearer 鉴权。

Q：和 BerriAI 是什么关系？ 没关系。这是 BlockRun 团队独立维护的适配层，挂在 LiteLLM 的 custom provider 接口上。

开发

git clone https://github.com/BlockRunAI/blockrun-litellm
cd blockrun-litellm
pip install -e '.[proxy,dev]'
pytest

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.2

May 12, 2026

0.3.1

May 12, 2026

0.3.0

May 12, 2026

This version

0.2.2

May 12, 2026

0.2.1

May 12, 2026

0.2.0

May 12, 2026

0.1.0

May 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

blockrun_litellm-0.2.2.tar.gz (28.4 kB view details)

Uploaded May 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

blockrun_litellm-0.2.2-py3-none-any.whl (22.4 kB view details)

Uploaded May 12, 2026 Python 3

File details

Details for the file blockrun_litellm-0.2.2.tar.gz.

File metadata

Download URL: blockrun_litellm-0.2.2.tar.gz
Upload date: May 12, 2026
Size: 28.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for blockrun_litellm-0.2.2.tar.gz
Algorithm	Hash digest
SHA256	`c4a4313a5395a027a5a4fbceedc21ac0f969952db9342ff32668b1857fd27273`
MD5	`976c4f119cd3e8fb770beea49ed232e1`
BLAKE2b-256	`fe13c224d3033b4dca191e4182ac1cc20f4dcb754fa16cbcb6be9cd042a5478d`

See more details on using hashes here.

File details

Details for the file blockrun_litellm-0.2.2-py3-none-any.whl.

File metadata

Download URL: blockrun_litellm-0.2.2-py3-none-any.whl
Upload date: May 12, 2026
Size: 22.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for blockrun_litellm-0.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e1be9695352e5cbe682393550766b4ffbb06f3e5b350eacdb9d51ba435b14fb0`
MD5	`510cf8f62527cdb8703498172b0bcdbf`
BLAKE2b-256	`c17987ace7c4212edf31d1ebf115d56f21e5f7c0fedf3e8fbb5520b288df2531`

See more details on using hashes here.

blockrun-litellm 0.2.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

blockrun-litellm

Two ways to integrate

Verified end-to-end against the live BlockRun gateway

Install

Configure your wallet (one-time)

Mode 1 — Custom provider (Python library)

1a. Register once at startup

1b. Call with a blockrun/ model prefix

1c. Override the wallet per-call (optional)

1d. Async

Mode 2 — Local proxy (LiteLLM Proxy Server, langchain, raw curl, …)

2a. Start the proxy

2b. Point LiteLLM Proxy at it

2c. Or skip LiteLLM entirely

2d. Endpoints exposed

Supported parameters

Examples

How it works (under the hood)

FAQ

Development

License

中文文档

两种对接方式

快速上手

安装

配钱包（一次性）

模式 1：自定义 Provider

模式 2：本地代理

支持的参数

常见问题

开发

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

1b. Call with a `blockrun/` model prefix