A Python module that provides rate limiting capabilities for the OpenAI API, utilizing Redis as a caching service. It helps to manage API usage to avoid exceeding OpenAI's rate limits.
Project description
openai-ratelimiter
openai-ratelimiter is a simple and efficient rate limiter for the OpenAI API. It is designed to help prevent the API rate limit from being reached when using the OpenAI library. Currently, it supports only Redis as the caching service.
Installation
To install the openai-ratelimiter library, use pip:
pip install openai-ratelimiter
Redis Setup
This library uses Redis for caching. If you don't have a Redis server setup, you can pull the Redis Docker image and run a container as follows:
# Pull the Redis image
docker pull redis
# Run the Redis container
docker run --name some-redis -p 6379:6379 -d redis
This will set up a Redis server accessible at localhost
on port 6379
.
Usage
The library provides two classes, ChatCompletionLimiter
and TextCompletionLimiter
, for limiting rate of API calls.
ChatCompletionLimiter
from openai_ratelimiter import ChatCompletionLimiter
import openai
openai.api_key = "{your API key}"
model_name = "gpt-3.5-turbo-16k"
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of Morocco."},
]
max_tokens = 200
chatlimiter = ChatCompletionLimiter(
model_name=model_name,
RPM=3_000,
TPM=250_000,
redis_host="localhost",
redis_port=6379,
)
with chatlimiter.limit(messages=messages, max_tokens=max_tokens):
response = openai.ChatCompletion.create(
model=model_name, messages=messages, max_tokens=max_tokens
)
...
TextCompletionLimiter
from openai_ratelimiter import TextCompletionLimiter
import openai
openai.api_key = "{your API key}"
model_name = "text-davinci-003"
prompt = "What is the capital of Morocco."
max_tokens = 200
textlimiter = TextCompletionLimiter(
model_name=model_name,
RPM=3_000,
TPM=250_000,
redis_host="localhost",
redis_port=6379,
)
with textlimiter.limit(prompt=prompt, max_tokens=max_tokens):
response = openai.Completion.create(
model=model_name, prompt=prompt, max_tokens=max_tokens
)
...
Future Plans
- In-memory caching
- Limiting for embeddings
- Limiting for DALL·E image model
- Implementing more functions that provide information about the current state
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for openai_ratelimiter-0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 62ee02bc5c7d5a5755349c16f8d05a36286e6af9b22f1134675ba8b145a531d1 |
|
MD5 | 9ab48ff71aeeabcc33676267d19668ac |
|
BLAKE2b-256 | 60e6bc0dab0f8872f6efc6c638590fdea7a2482d1cee2c835ad76f2574f1a4ae |