Skip to main content

OpenAI-compatible client + worker that bridge inference requests over a Redis queue (Redis Streams), for running LLMs across heavily restricted networks.

Project description

openai-rq

CI License Python Lint Docs Deploy docs

📊 Overview deck: https://allen2c.github.io/openai-rq/

Use the OpenAI SDK from behind a locked-down network where the only reachable outbound endpoint is Redis. openai-rq ships each OpenAI HTTP request over Redis Streams to a worker that replays it against a local OpenAI-compatible server (e.g. vLLM) and streams the response back — your client code stays identical to normal OpenAI usage.

  your client ──(Redis Streams)──▶  openai-rq worker ──▶  http://localhost:8000/v1
   OpenAIRQ    ◀─(Redis Streams)──                         (vLLM / OpenAI-compatible)

Both sides connect only to Redis. No direct HTTP between client and the inference box.

Install

pip install openai-rq

Client — a drop-in openai.OpenAI

Swap openai.OpenAI for openai_rq.OpenAIRQ and point it at Redis. Everything else — parameters, response objects, streaming, error handling — works unchanged.

from openai_rq import OpenAIRQ

client = OpenAIRQ(redis_url="redis://localhost:6379/0")

resp = client.chat.completions.create(
    model="openai/gpt-oss-120b",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(resp.choices[0].message.content)

Streaming

stream = client.chat.completions.create(
    model="openai/gpt-oss-120b",
    messages=[{"role": "user", "content": "Write a haiku."}],
    stream=True,
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)

Async

from openai_rq import AsyncOpenAIRQ

client = AsyncOpenAIRQ(redis_url="redis://localhost:6379/0")

resp = await client.chat.completions.create(
    model="openai/gpt-oss-120b",
    messages=[{"role": "user", "content": "Hello!"}],
)

extra_headers and extra_body pass through verbatim, so server-specific options (guided decoding, etc.) just work:

client.chat.completions.create(
    model="openai/gpt-oss-120b",
    messages=[...],
    extra_body={"guided_json": schema},
)

Worker — run it next to the inference server

On the inference box, run a worker that relays jobs to your local server:

openai-rq worker \
  --redis-url redis://localhost:6379/0 \
  --openai-base-url http://localhost:8000/v1 \
  --concurrency 16

Run as many workers as you like against the same Redis — jobs are load-balanced across them via a Redis consumer group.

Backend needs an API key?

The credential lives only on the worker — it never transits Redis or the client.

# Bearer style → Authorization: Bearer <key>
export OPENAI_API_KEY=<key>
openai-rq worker --redis-url redis://localhost:6379/0 --openai-base-url http://localhost:8000/v1

# Server that expects a custom auth header instead of Bearer (repeatable)
openai-rq worker --redis-url redis://localhost:6379/0 \
  --openai-base-url http://localhost:8000/v1 \
  --openai-header api-key=<key>

Embedding the worker

from openai_rq.worker import Worker

worker = Worker(
    redis_url="redis://localhost:6379/0",
    openai_base_url="http://localhost:8000/v1",
    openai_api_key="<key>",         # optional; → Authorization: Bearer
    concurrency=16,
)
await worker.run()

Worker options

Option Default Description
--redis-url (required) Redis URL; use rediss:// for TLS
--openai-base-url http://localhost:8000/v1 local OpenAI-compatible server
--openai-api-key env OPENAI_API_KEY injected as Authorization: Bearer
--openai-header extra backend header KEY=VALUE (repeatable)
--concurrency 16 in-flight jobs per worker
--stream-flush-ms 50 streaming coalesce window
--result-ttl-s 600 TTL on result/stream keys
--max-retries 3 queue retries before dead-letter

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openai_rq-0.1.2.tar.gz (16.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openai_rq-0.1.2-py3-none-any.whl (18.0 kB view details)

Uploaded Python 3

File details

Details for the file openai_rq-0.1.2.tar.gz.

File metadata

  • Download URL: openai_rq-0.1.2.tar.gz
  • Upload date:
  • Size: 16.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.4.1 CPython/3.12.13 Linux/6.14.0-1015-nvidia

File hashes

Hashes for openai_rq-0.1.2.tar.gz
Algorithm Hash digest
SHA256 48ab8288bc6c776ae657bdd659861ec1b542dd8b54e720f7d80d5dd3869f65b9
MD5 12c0dcd645dc5ed4a81d807864f8d2e8
BLAKE2b-256 d7f755b7ea88f49a24decd5acffe53e5b4f902d076bcfb7d76ff6787f4fe4f2a

See more details on using hashes here.

File details

Details for the file openai_rq-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: openai_rq-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 18.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.4.1 CPython/3.12.13 Linux/6.14.0-1015-nvidia

File hashes

Hashes for openai_rq-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 67eae7fa6744e9a0c97f868acb1f66da7f2586f5f917fc8fabad4519243e2395
MD5 379f5853caa4f515d40cf2439d4ef685
BLAKE2b-256 b06d234e101d155f166a2ca965f78c63df839514c0d1a3284ee99340b25cc2ee

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page