rotates models to avoid hitting rate limit
Project description
Model Rotator
Model Rotator is a Python library for managing multiple LLM (Large Language Model) instances with rate limits and priorities. It dynamically schedules requests to models based on their rate limits, usage, and priority levels, ensuring optimal utilization of available resources.
Features
- Rate-Limit Management: Automatically tracks and enforces rate limits per model.
- Priority-Based Scheduling: Prioritizes high-priority models over medium and low-priority ones.
- Dynamic Updates: Tracks model usage in real-time and prunes stale usage data.
- Stateful: Maintains state for each model's usage across calls.
- Customizable: Easily configure models with different rate limits and priorities.
Installation
Install the package from PyPI:
pip install model-rotator
Usage
- Define Your Models Provide a list of model configurations:
Note: Models listed first will have higher priority if models have same priority given.
from model_rotator import ModelRotator
models = [
{"name": "groq/llama-3.1-70b-versatile", "priority": "high", "limit": 30},
{"name": "groq/llama-3.1-70b-specdec", "priority": "high", "limit": 30},
{"name": "groq/llama-3.1-8b-instant", "priority": "medium", "limit": 30},
{"name": "groq/llama-3.2-1b-preview", "priority": "low", "limit": 30},
{"name": "gemini/gemini-1.5-flash", "priority": "medium", "limit": 30},
{"name": "gemini/gemini-1.5-pro", "priority": "high", "limit": 15},
{"name": "gemini/gemini-exp-1114", "priority": "high", "limit": 2},
]
- Initialize the Scheduler
rotator = ModelRotator(models)
- Schedule Requests Use get_next_model() to get the next available model for processing:
for _ in range(50): # Simulate 50 requests
model = rotator.get_next_model()
if model:
print(f"Using model: {model}")
else:
print("All models exhausted, retry later.")
- Check Model States Inspect the current state of all models:
print(rotator.get_state())
Example Output
Copy code
Request 1: Using model: groq/llama-3.1-70b-versatile
Request 2: Using model: groq/llama-3.1-70b-specdec
...
Request 50: All models exhausted, retry later.
Model States:
[
{"name": "groq/llama-3.1-70b-versatile", "priority": "high", "limit": 30, "current_usage": 30},
{"name": "groq/llama-3.1-70b-specdec", "priority": "high", "limit": 30, "current_usage": 30},
...
]
API
ModelRotator(models:Model)
Initializes the scheduler.
models
: A list of dictionaries. Each dictionary must include:name
(str): The model name.priority
(str): Priority level (high
,medium
,low
).limit
(int): Maximum allowed requests per minute.
get_next_model()
Returns the name of the next available model based on priority and rate limits.
- Returns:
str
: The model name, orNone
if no models are available.
get_state()
Returns the current state of all models, including their usage.
- Returns:
list
: A list of dictionaries with the following fields:name
(str): Model name.priority
(str): Priority level.limit
(int): Rate limit.current_usage
(int): Current number of requests within the last minute.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Contributing
Contributions are welcome! Feel free to open issues or submit pull requests.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file model_rotator-0.1.2.tar.gz
.
File metadata
- Download URL: model_rotator-0.1.2.tar.gz
- Upload date:
- Size: 3.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.12.2 Linux/6.8.0-49-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6881930e49d78d9756c61634f0525f238b7bef0afb29adf74140e6171e364af8 |
|
MD5 | fddebfdd5a65e1a50baac74f9260b643 |
|
BLAKE2b-256 | bdaadc562cc529d8202717fded85f83e8a9888f76d3de8a39813cf6ca94c5067 |
File details
Details for the file model_rotator-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: model_rotator-0.1.2-py3-none-any.whl
- Upload date:
- Size: 4.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.12.2 Linux/6.8.0-49-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4f75f90e26209cd42379513686b26c65a43f2d2e8ae112a934cb7f60fe8a9349 |
|
MD5 | f4f5f9ca4407fe80aa51e7a443ccda3c |
|
BLAKE2b-256 | edfe79ec5c887e705aefab88bc0f38ad31118aff903fa74ad30cf1337f385402 |