Fastest OpenAI client — Rust core with Python bindings

Project description

openai-oxide

A high-performance, feature-complete OpenAI client for Rust and Python.
Designed for agentic workflows, low-latency streaming, and WebAssembly.

openai-oxide implements the full Responses API, Chat Completions, and 20+ other endpoints. It introduces performance primitives like persistent WebSockets, hedged requests, and early-parsing for function calls — features previously unavailable in the Rust ecosystem.

Why openai-oxide?

We built openai-oxide to squeeze every millisecond out of the OpenAI API.

Zero-Overhead Streaming: Uses a custom zero-copy SSE parser. By enforcing strict Accept: text/event-stream and Cache-Control: no-cache headers, it prevents reverse-proxy buffering, achieving Time-To-First-Token (TTFT) in ~670ms.
WebSocket Mode: Maintains a persistent wss:// connection for the Responses API. By bypassing per-request TLS handshakes, it reduces multi-turn agent loop latency by up to 37%.
Hedged Requests: Send redundant requests and cancel the slower ones. Costs 2-7% extra tokens but reliably reduces P99 tail latency by 50-96% (inspired by Google's "The Tail at Scale").
Stream FC Early Parse: Yields function calls the exact moment arguments.done is emitted, allowing you to start executing local tools ~400ms before the overall response finishes.
WASM First-Class: Compiles to wasm32-unknown-unknown without dropping features. Unlike other clients, streaming, retries, and early-parsing work flawlessly in Cloudflare Workers and browsers.

Quick Start

Add the crate to your Cargo.toml:

[dependencies]
openai-oxide = "0.9"
tokio = { version = "1", features = ["full"] }

use openai_oxide::{OpenAI, types::responses::*};

#[tokio::main]
async fn main() -> Result<(), openai_oxide::OpenAIError> {
    let client = OpenAI::from_env()?; // Uses OPENAI_API_KEY

    let response = client.responses().create(
        ResponseCreateRequest::new("gpt-5.4")
            .input("Explain quantum computing in one sentence.")
            .max_output_tokens(100)
    ).await?;

    println!("{}", response.output_text());
    Ok(())
}

Benchmarks

All benchmarks were run to ensure a fair, real-world comparison of the clients:

Environment: macOS (M-series), native compilation.
Model: gpt-5.4 via the official OpenAI API.
Protocol: TLS + HTTP/2 multiplexing with connection pooling (warm connections).
Execution: 5 iterations per test. The reported value is the Median time.
Rust APIs: openai-oxide provides first-class support for both the traditional Chat Completions API (/v1/chat/completions) and the newer Responses API (/v1/responses). The Responses API has slightly higher backend orchestration latency on OpenAI's side for non-streamed requests, so we separate them for fairness.

Rust Ecosystem (`openai-oxide` vs `async-openai` vs `genai`)

Test	`openai-oxide` (WebSockets)	`openai-oxide` (Responses API)	`async-openai` (Responses API)	`genai` (Responses API)	`openai-oxide` (Chat API)	`genai` (Chat API)
Plain text	710ms ( -29% )	1000ms	960ms	835ms	753ms	722ms
Structured output	~1000ms	1352ms	N/A	1197ms	1304ms	N/A
Function calling	~850ms	1164ms	1748ms	1030ms	1252ms	N/A
Streaming TTFT	~400ms	670ms	685ms	670ms	695ms	N/A
Multi-turn (2 reqs)	1425ms ( -35% )	2219ms	3275ms	1641ms	2011ms	1560ms
Rapid-fire (5 calls)	3227ms ( -37% )	5147ms	5166ms	3807ms	4671ms	3540ms
Parallel 3x (fan-out)	N/A ( Sync )	1081ms	1053ms	866ms	978ms	801ms

Reproduce: cargo run --example benchmark --features responses --release

Python Ecosystem (`openai-oxide-python` vs `openai`)

openai-oxide comes with native Python bindings via PyO3, exposing a drop-in async interface that outperforms the official Python SDK (openai + httpx).

Run uv run python examples/bench_python.py from the openai-oxide-python directory to test locally (Python 3.13).

Test	`openai-oxide-python`	`openai` (httpx)	Winner
Plain text	894ms	990ms	OXIDE (+9%)
Structured output	1354ms	1391ms	OXIDE (+2%)
Function calling	1089ms	1125ms	OXIDE (+3%)
Multi-turn (2 reqs)	2057ms	2232ms	OXIDE (+7%)
Web search	3276ms	3039ms	python (+7%)
Nested structured output	4811ms	4186ms	python (+14%)
Agent loop (2-step)	3408ms	3984ms	OXIDE (+14%)
Rapid-fire (5 sequential calls)	4835ms	5075ms	OXIDE (+4%)
Prompt-cached	4511ms	4327ms	python (+4%)
Streaming TTFT	709ms	769ms	OXIDE (+7%)
Parallel 3x (fan-out)	961ms	994ms	OXIDE (+3%)
Hedged (2x race)	1082ms	1001ms	python (+8%)

Python Usage

Install via uv or pip:

cd openai-oxide-python
uv sync
uv run maturin develop --release

import asyncio
from openai_oxide_python import Client

async def main():
    client = Client()
    
    # 1. Standard request
    res = await client.create("gpt-5.4", "Hello!")
    print(res["text"])
    
    # 2. Streaming (Async Iterator)
    stream = await client.create_stream("gpt-5.4", "Explain quantum computing...", max_output_tokens=200)
    async for event in stream:
        print(event)

asyncio.run(main())

Advanced Features Guide

WebSocket Mode

Persistent connections bypass the TLS handshake penalty for every request. Ideal for high-speed agent loops.

let client = OpenAI::from_env()?;
let mut session = client.ws_session().await?;

// All calls route through the same wss:// connection
let r1 = session.send(
    ResponseCreateRequest::new("gpt-5.4").input("My name is Rustam.").store(true)
).await?;

let r2 = session.send(
    ResponseCreateRequest::new("gpt-5.4").input("What's my name?").previous_response_id(&r1.id)
).await?;

session.close().await?;

Streaming FC Early Parse

Start executing your local functions instantly when the model finishes generating the arguments, rather than waiting for the entire stream to close.

let mut handle = client.responses().create_stream_fc(request).await?;

while let Some(fc) = handle.recv().await {
    // Fires immediately on `arguments.done`
    let result = execute_tool(&fc.name, &fc.arguments).await;
}

Hedged Requests

Protect your application against random network latency spikes.

use openai_oxide::hedged_request;
use std::time::Duration;

// Sends 2 identical requests with a 1.5s delay. Returns whichever finishes first.
let response = hedged_request(&client, request, Some(Duration::from_secs(2))).await?;

Parallel Fan-Out

Leverage HTTP/2 multiplexing natively. Send 3 concurrent requests over a single connection; the total wall time is equal to the slowest single request.

let (c1, c2, c3) = (client.clone(), client.clone(), client.clone());
let (r1, r2, r3) = tokio::join!(
    async { c1.responses().create(req1).await },
    async { c2.responses().create(req2).await },
    async { c3.responses().create(req3).await },
);

Implemented APIs

API	Method
Chat Completions	`client.chat().completions().create()` / `create_stream()`
Responses	`client.responses().create()` / `create_stream()` / `create_stream_fc()`
Responses Tools	Function, WebSearch, FileSearch, CodeInterpreter, ComputerUse, Mcp, ImageGeneration
WebSocket	`client.ws_session()` — send / send_stream / warmup / close
Hedged	`hedged_request()` / `hedged_request_n()` / `speculative()`
Embeddings	`client.embeddings().create()`
Models	`client.models().list()` / `retrieve()` / `delete()`
Images	`client.images().generate()` / `edit()` / `create_variation()`
Audio	`client.audio().transcriptions()` / `translations()` / `speech()`
Files	`client.files().create()` / `list()` / `retrieve()` / `delete()` / `content()`
Fine-tuning	`client.fine_tuning().jobs().create()` / `list()` / `cancel()` / `list_events()`
Moderations	`client.moderations().create()`
Batches	`client.batches().create()` / `list()` / `retrieve()` / `cancel()`
Uploads	`client.uploads().create()` / `cancel()` / `complete()`
Pagination	`list_page()` / `list_auto()` — cursor-based, async stream
Assistants (beta)	Full CRUD + threads + runs + vector stores
Realtime (beta)	`client.beta().realtime().sessions().create()`

Cargo Features & WASM Optimization

Every endpoint is gated behind a Cargo feature. If you are building for WebAssembly (WASM) (e.g., Cloudflare Workers, Dioxus, Leptos), you can significantly reduce your .wasm binary size and compilation time by disabling default features and only compiling what you need.

[dependencies]
# Example: Compile ONLY the Responses API (removes Audio, Images, Assistants, etc.)
openai-oxide = { version = "0.9", default-features = false, features = ["responses"] }

Available API Features:

chat — Chat Completions
responses — Responses API (Supports WebSocket)
embeddings — Text Embeddings
images — Image Generation (DALL-E)
audio — TTS and Transcription
files — File management
fine-tuning — Model Fine-tuning
models — Model listing
moderations — Moderation API
batches — Batch API
uploads — Upload API
beta — Assistants, Threads, Vector Stores, Realtime API

Ecosystem Features:

websocket — Enables Realtime API over WebSockets (Native: tokio-tungstenite)
websocket-wasm — Enables Realtime API over WebSockets (WASM: gloo-net / web-sys)
simd — Enables simd-json for ultra-fast JSON deserialization (requires nightly Rust)

Check out our Cloudflare Worker Examples showcasing a Full-Stack Rust app with a Dioxus frontend and a Cloudflare Worker Durable Object backend holding a WebSocket connection to OpenAI.

Configuration

use openai_oxide::{OpenAI, config::ClientConfig};
use openai_oxide::azure::AzureConfig;

let client = OpenAI::new("sk-...");                             // Explicit key
let client = OpenAI::with_config(                               // Custom config
    ClientConfig::new("sk-...").base_url("https://...").timeout_secs(30).max_retries(3)
);
let client = OpenAI::azure(AzureConfig::new()                   // Azure OpenAI
    .azure_endpoint("https://my.openai.azure.com").azure_deployment("gpt-4").api_key("...")
)?;

Keeping up with OpenAI

OpenAI moves fast. To ensure openai-oxide never falls behind, we built an automated architecture synchronization pipeline.

Types are strictly validated against the official OpenAPI spec and cross-checked directly with the official Python SDK's AST.

make sync       # downloads latest spec, diffs against local schema, runs coverage

make sync automatically:

Downloads the latest OpenAPI schema from OpenAI.
Displays a precise git diff of newly added endpoints, struct fields, and enums.
Runs the openapi_coverage test suite to statically verify our Rust types against the spec.

Coverage is enforced on every commit via pre-commit hooks. Current field coverage for all implemented typed schemas is 100%. This guarantees 1:1 feature parity with the Python SDK, ensuring you can adopt new OpenAI models and features on day one.

Used In

sgr-agent — LLM agent framework with structured output, function calling, and agent loops. openai-oxide is the default backend.
rust-code — AI-powered TUI coding agent.

License

MIT

Project details

Release history Release notifications | RSS feed

0.11.2

Mar 30, 2026

0.11.1

Mar 30, 2026

0.9.8

Mar 24, 2026

0.9.7

Mar 23, 2026

0.9.6

Mar 23, 2026

0.9.5

Mar 23, 2026

0.9.4

Mar 21, 2026

0.9.3

Mar 21, 2026

0.9.2

Mar 21, 2026

0.9.1

Mar 21, 2026

0.9.0

Mar 21, 2026

This version

0.1.0

Mar 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openai_oxide-0.1.0.tar.gz (454.5 kB view details)

Uploaded Mar 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

openai_oxide-0.1.0-cp313-cp313-macosx_11_0_arm64.whl (3.3 MB view details)

Uploaded Mar 21, 2026 CPython 3.13macOS 11.0+ ARM64

File details

Details for the file openai_oxide-0.1.0.tar.gz.

File metadata

Download URL: openai_oxide-0.1.0.tar.gz
Upload date: Mar 21, 2026
Size: 454.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.5

File hashes

Hashes for openai_oxide-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`db9416fc285a3e350d86b9ca7ff7ea8120dd91ae2d4beddbcaf85474e1bf6c1e`
MD5	`d97d3c9effb702c18e4a04d1817f29e5`
BLAKE2b-256	`7032e12243a8bb178cd92c4bc0591c0248351115f64f7ccd495d1e7035391843`

See more details on using hashes here.

File details

Details for the file openai_oxide-0.1.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

Download URL: openai_oxide-0.1.0-cp313-cp313-macosx_11_0_arm64.whl
Upload date: Mar 21, 2026
Size: 3.3 MB
Tags: CPython 3.13, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.5

File hashes

Hashes for openai_oxide-0.1.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`8daec787e66ec7c145a48f4de34b2d202d8875d5b46f22e8ffac06961f74d804`
MD5	`5cb19a08206b4f2170aed891a29e5e70`
BLAKE2b-256	`82c401e4afdf998f9fa1138e4960af4af0c3e0feef40e8dd267dda8e99f332f1`

See more details on using hashes here.

openai-oxide 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

openai-oxide

Why openai-oxide?

Quick Start

Benchmarks

Rust Ecosystem (openai-oxide vs async-openai vs genai)

Python Ecosystem (openai-oxide-python vs openai)

Python Usage

Advanced Features Guide

WebSocket Mode

Streaming FC Early Parse

Hedged Requests

Parallel Fan-Out

Implemented APIs

Cargo Features & WASM Optimization

Available API Features:

Ecosystem Features:

Configuration

Keeping up with OpenAI

Used In

See Also

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Rust Ecosystem (`openai-oxide` vs `async-openai` vs `genai`)

Python Ecosystem (`openai-oxide-python` vs `openai`)