Fastest OpenAI client — Rust core with Python bindings
Project description
openai-oxide
A high-performance, feature-complete OpenAI client for Rust and Python.
Designed for agentic workflows, low-latency streaming, and WebAssembly.
openai-oxide implements the full Responses API, Chat Completions, and 20+ other endpoints. It introduces performance primitives like persistent WebSockets, hedged requests, and early-parsing for function calls — features previously unavailable in the Rust ecosystem.
Why openai-oxide?
We built openai-oxide to squeeze every millisecond out of the OpenAI API.
- Zero-Overhead Streaming: Uses a custom zero-copy SSE parser. By enforcing strict
Accept: text/event-streamandCache-Control: no-cacheheaders, it prevents reverse-proxy buffering, achieving Time-To-First-Token (TTFT) in ~670ms. - WebSocket Mode: Maintains a persistent
wss://connection for the Responses API. By bypassing per-request TLS handshakes, it reduces multi-turn agent loop latency by up to 37%. - Hedged Requests: Send redundant requests and cancel the slower ones. Costs 2-7% extra tokens but reliably reduces P99 tail latency by 50-96% (inspired by Google's "The Tail at Scale").
- Stream FC Early Parse: Yields function calls the exact moment
arguments.doneis emitted, allowing you to start executing local tools ~400ms before the overall response finishes. - WASM First-Class: Compiles to
wasm32-unknown-unknownwithout dropping features. Unlike other clients, streaming, retries, and early-parsing work flawlessly in Cloudflare Workers and browsers.
Quick Start
Add the crate to your Cargo.toml:
[dependencies]
openai-oxide = "0.9"
tokio = { version = "1", features = ["full"] }
use openai_oxide::{OpenAI, types::responses::*};
#[tokio::main]
async fn main() -> Result<(), openai_oxide::OpenAIError> {
let client = OpenAI::from_env()?; // Uses OPENAI_API_KEY
let response = client.responses().create(
ResponseCreateRequest::new("gpt-5.4")
.input("Explain quantum computing in one sentence.")
.max_output_tokens(100)
).await?;
println!("{}", response.output_text());
Ok(())
}
Benchmarks
All benchmarks were run to ensure a fair, real-world comparison of the clients:
- Environment: macOS (M-series), native compilation.
- Model:
gpt-5.4via the official OpenAI API. - Protocol: TLS + HTTP/2 multiplexing with connection pooling (warm connections).
- Execution: 5 iterations per test. The reported value is the Median time.
- Rust APIs:
openai-oxideprovides first-class support for both the traditionalChat Completions API(/v1/chat/completions) and the newerResponses API(/v1/responses). The Responses API has slightly higher backend orchestration latency on OpenAI's side for non-streamed requests, so we separate them for fairness.
Rust Ecosystem (openai-oxide vs async-openai vs genai)
| Test | openai-oxide(WebSockets) |
openai-oxide(Responses API) |
async-openai(Responses API) |
genai(Responses API) |
openai-oxide(Chat API) |
genai(Chat API) |
|---|---|---|---|---|---|---|
| Plain text | 710ms ( -29% ) | 1000ms | 960ms | 835ms | 753ms | 722ms |
| Structured output | ~1000ms | 1352ms | N/A | 1197ms | 1304ms | N/A |
| Function calling | ~850ms | 1164ms | 1748ms | 1030ms | 1252ms | N/A |
| Streaming TTFT | ~400ms | 670ms | 685ms | 670ms | 695ms | N/A |
| Multi-turn (2 reqs) | 1425ms ( -35% ) | 2219ms | 3275ms | 1641ms | 2011ms | 1560ms |
| Rapid-fire (5 calls) | 3227ms ( -37% ) | 5147ms | 5166ms | 3807ms | 4671ms | 3540ms |
| Parallel 3x (fan-out) | N/A ( Sync ) | 1081ms | 1053ms | 866ms | 978ms | 801ms |
Reproduce: cargo run --example benchmark --features responses --release
Python Ecosystem (openai-oxide-python vs openai)
openai-oxide comes with native Python bindings via PyO3, exposing a drop-in async interface that outperforms the official Python SDK (openai + httpx).
Run uv run python examples/bench_python.py from the openai-oxide-python directory to test locally (Python 3.13).
| Test | openai-oxide-python |
openai (httpx) |
Winner |
|---|---|---|---|
| Plain text | 894ms | 990ms | OXIDE (+9%) |
| Structured output | 1354ms | 1391ms | OXIDE (+2%) |
| Function calling | 1089ms | 1125ms | OXIDE (+3%) |
| Multi-turn (2 reqs) | 2057ms | 2232ms | OXIDE (+7%) |
| Web search | 3276ms | 3039ms | python (+7%) |
| Nested structured output | 4811ms | 4186ms | python (+14%) |
| Agent loop (2-step) | 3408ms | 3984ms | OXIDE (+14%) |
| Rapid-fire (5 sequential calls) | 4835ms | 5075ms | OXIDE (+4%) |
| Prompt-cached | 4511ms | 4327ms | python (+4%) |
| Streaming TTFT | 709ms | 769ms | OXIDE (+7%) |
| Parallel 3x (fan-out) | 961ms | 994ms | OXIDE (+3%) |
| Hedged (2x race) | 1082ms | 1001ms | python (+8%) |
Python Usage
Install via uv or pip:
cd openai-oxide-python
uv sync
uv run maturin develop --release
import asyncio
from openai_oxide_python import Client
async def main():
client = Client()
# 1. Standard request
res = await client.create("gpt-5.4", "Hello!")
print(res["text"])
# 2. Streaming (Async Iterator)
stream = await client.create_stream("gpt-5.4", "Explain quantum computing...", max_output_tokens=200)
async for event in stream:
print(event)
asyncio.run(main())
Advanced Features Guide
WebSocket Mode
Persistent connections bypass the TLS handshake penalty for every request. Ideal for high-speed agent loops.
let client = OpenAI::from_env()?;
let mut session = client.ws_session().await?;
// All calls route through the same wss:// connection
let r1 = session.send(
ResponseCreateRequest::new("gpt-5.4").input("My name is Rustam.").store(true)
).await?;
let r2 = session.send(
ResponseCreateRequest::new("gpt-5.4").input("What's my name?").previous_response_id(&r1.id)
).await?;
session.close().await?;
Streaming FC Early Parse
Start executing your local functions instantly when the model finishes generating the arguments, rather than waiting for the entire stream to close.
let mut handle = client.responses().create_stream_fc(request).await?;
while let Some(fc) = handle.recv().await {
// Fires immediately on `arguments.done`
let result = execute_tool(&fc.name, &fc.arguments).await;
}
Hedged Requests
Protect your application against random network latency spikes.
use openai_oxide::hedged_request;
use std::time::Duration;
// Sends 2 identical requests with a 1.5s delay. Returns whichever finishes first.
let response = hedged_request(&client, request, Some(Duration::from_secs(2))).await?;
Parallel Fan-Out
Leverage HTTP/2 multiplexing natively. Send 3 concurrent requests over a single connection; the total wall time is equal to the slowest single request.
let (c1, c2, c3) = (client.clone(), client.clone(), client.clone());
let (r1, r2, r3) = tokio::join!(
async { c1.responses().create(req1).await },
async { c2.responses().create(req2).await },
async { c3.responses().create(req3).await },
);
Implemented APIs
| API | Method |
|---|---|
| Chat Completions | client.chat().completions().create() / create_stream() |
| Responses | client.responses().create() / create_stream() / create_stream_fc() |
| Responses Tools | Function, WebSearch, FileSearch, CodeInterpreter, ComputerUse, Mcp, ImageGeneration |
| WebSocket | client.ws_session() — send / send_stream / warmup / close |
| Hedged | hedged_request() / hedged_request_n() / speculative() |
| Embeddings | client.embeddings().create() |
| Models | client.models().list() / retrieve() / delete() |
| Images | client.images().generate() / edit() / create_variation() |
| Audio | client.audio().transcriptions() / translations() / speech() |
| Files | client.files().create() / list() / retrieve() / delete() / content() |
| Fine-tuning | client.fine_tuning().jobs().create() / list() / cancel() / list_events() |
| Moderations | client.moderations().create() |
| Batches | client.batches().create() / list() / retrieve() / cancel() |
| Uploads | client.uploads().create() / cancel() / complete() |
| Pagination | list_page() / list_auto() — cursor-based, async stream |
| Assistants (beta) | Full CRUD + threads + runs + vector stores |
| Realtime (beta) | client.beta().realtime().sessions().create() |
Cargo Features & WASM Optimization
Every endpoint is gated behind a Cargo feature. If you are building for WebAssembly (WASM) (e.g., Cloudflare Workers, Dioxus, Leptos), you can significantly reduce your .wasm binary size and compilation time by disabling default features and only compiling what you need.
[dependencies]
# Example: Compile ONLY the Responses API (removes Audio, Images, Assistants, etc.)
openai-oxide = { version = "0.9", default-features = false, features = ["responses"] }
Available API Features:
chat— Chat Completionsresponses— Responses API (Supports WebSocket)embeddings— Text Embeddingsimages— Image Generation (DALL-E)audio— TTS and Transcriptionfiles— File managementfine-tuning— Model Fine-tuningmodels— Model listingmoderations— Moderation APIbatches— Batch APIuploads— Upload APIbeta— Assistants, Threads, Vector Stores, Realtime API
Ecosystem Features:
websocket— Enables Realtime API over WebSockets (Native:tokio-tungstenite)websocket-wasm— Enables Realtime API over WebSockets (WASM:gloo-net/web-sys)simd— Enablessimd-jsonfor ultra-fast JSON deserialization (requires nightly Rust)
Check out our Cloudflare Worker Examples showcasing a Full-Stack Rust app with a Dioxus frontend and a Cloudflare Worker Durable Object backend holding a WebSocket connection to OpenAI.
Configuration
use openai_oxide::{OpenAI, config::ClientConfig};
use openai_oxide::azure::AzureConfig;
let client = OpenAI::new("sk-..."); // Explicit key
let client = OpenAI::with_config( // Custom config
ClientConfig::new("sk-...").base_url("https://...").timeout_secs(30).max_retries(3)
);
let client = OpenAI::azure(AzureConfig::new() // Azure OpenAI
.azure_endpoint("https://my.openai.azure.com").azure_deployment("gpt-4").api_key("...")
)?;
Keeping up with OpenAI
OpenAI moves fast. To ensure openai-oxide never falls behind, we built an automated architecture synchronization pipeline.
Types are strictly validated against the official OpenAPI spec and cross-checked directly with the official Python SDK's AST.
make sync # downloads latest spec, diffs against local schema, runs coverage
make sync automatically:
- Downloads the latest OpenAPI schema from OpenAI.
- Displays a precise
git diffof newly added endpoints, struct fields, and enums. - Runs the
openapi_coveragetest suite to statically verify our Rust types against the spec.
Coverage is enforced on every commit via pre-commit hooks. Current field coverage for all implemented typed schemas is 100%. This guarantees 1:1 feature parity with the Python SDK, ensuring you can adopt new OpenAI models and features on day one.
Used In
- sgr-agent — LLM agent framework with structured output, function calling, and agent loops.
openai-oxideis the default backend. - rust-code — AI-powered TUI coding agent.
See Also
- openai-python — Official Python SDK (our benchmark baseline)
- async-openai — Alternative Rust client (mature, 1800+ stars)
- genai — Multi-provider Rust client (Gemini, Anthropic, OpenAI)
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file openai_oxide-0.1.0.tar.gz.
File metadata
- Download URL: openai_oxide-0.1.0.tar.gz
- Upload date:
- Size: 454.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
db9416fc285a3e350d86b9ca7ff7ea8120dd91ae2d4beddbcaf85474e1bf6c1e
|
|
| MD5 |
d97d3c9effb702c18e4a04d1817f29e5
|
|
| BLAKE2b-256 |
7032e12243a8bb178cd92c4bc0591c0248351115f64f7ccd495d1e7035391843
|
File details
Details for the file openai_oxide-0.1.0-cp313-cp313-macosx_11_0_arm64.whl.
File metadata
- Download URL: openai_oxide-0.1.0-cp313-cp313-macosx_11_0_arm64.whl
- Upload date:
- Size: 3.3 MB
- Tags: CPython 3.13, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8daec787e66ec7c145a48f4de34b2d202d8875d5b46f22e8ffac06961f74d804
|
|
| MD5 |
5cb19a08206b4f2170aed891a29e5e70
|
|
| BLAKE2b-256 |
82c401e4afdf998f9fa1138e4960af4af0c3e0feef40e8dd267dda8e99f332f1
|