An asynchronous handler for multiple LLM APIs
Project description
Async LLM Handler
Async LLM Handler is a Python package that provides a unified interface for interacting with multiple Language Model APIs asynchronously. It currently supports Gemini, Claude, and OpenAI APIs.
Features
- Asynchronous API calls
- Support for multiple LLM providers:
- Gemini (model: gemini_flash)
- Claude (models: claude_3_5_sonnet, claude_3_haiku)
- OpenAI (models: gpt_4o, gpt_4o_mini)
- Automatic rate limiting for each API
- Token counting and prompt clipping utilities
Installation
Install the Async LLM Handler using pip:
pip install async-llm-handler
Configuration
Before using the package, set up your environment variables in a .env
file in your project's root directory:
GEMINI_API_KEY=your_gemini_api_key
CLAUDE_API_KEY=your_claude_api_key
OPENAI_API_KEY=your_openai_api_key
Usage
Basic Usage
import asyncio
from async_llm_handler import Handler
async def main():
handler = Handler()
# Using the default model
response = await handler.query("What is the capital of France?")
print(response)
# Specifying a model
response = await handler.query("Explain quantum computing", model="claude_3_5_sonnet")
print(response)
asyncio.run(main())
Advanced Usage
Using Multiple Models Concurrently
import asyncio
from async_llm_handler import Handler
async def main():
handler = Handler()
prompt = "Explain the theory of relativity"
tasks = [
handler.query(prompt, model='gemini_flash'),
handler.query(prompt, model='gpt_4o'),
handler.query(prompt, model='claude_3_5_sonnet')
]
responses = await asyncio.gather(*tasks)
for model, response in zip(['Gemini Flash', 'GPT-4o', 'Claude 3.5 Sonnet'], responses):
print(f"Response from {model}:")
print(response)
print()
asyncio.run(main())
Limiting Input and Output Tokens
import asyncio
from async_llm_handler import Handler
async def main():
handler = Handler()
long_prompt = "Provide a detailed explanation of the entire history of artificial intelligence, including all major milestones and breakthroughs."
response = await handler.query(long_prompt, model="gpt_4o", max_input_tokens=1000, max_output_tokens=500)
print(response)
asyncio.run(main())
Supported Models
The package supports the following models:
-
Gemini:
gemini_flash
-
Claude:
claude_3_5_sonnet
claude_3_haiku
-
OpenAI:
gpt_4o
gpt_4o_mini
You can specify these models using the model
parameter in the query
method.
Error Handling
The package uses custom exceptions for error handling. Wrap your API calls in try-except blocks to handle potential errors:
import asyncio
from async_llm_handler import Handler
from async_llm_handler.exceptions import LLMAPIError, RateLimitTimeoutError
async def main():
handler = Handler()
try:
response = await handler.query("What is the meaning of life?", model="gpt_4o")
print(response)
except LLMAPIError as e:
print(f"An API error occurred: {e}")
except RateLimitTimeoutError as e:
print(f"Rate limit exceeded: {e}")
asyncio.run(main())
Rate Limiting
The package automatically handles rate limiting for each API. The current rate limits are:
- Gemini Flash: 30 requests per minute
- Claude 3.5 Sonnet: 5 requests per minute
- Claude 3 Haiku: 5 requests per minute
- GPT-4o: 5 requests per minute
- GPT-4o mini: 5 requests per minute
If you exceed these limits, the package will automatically wait before making the next request.
Utility Functions
The package includes utility functions for token counting and prompt clipping:
from async_llm_handler.utils import count_tokens, clip_prompt
text = "This is a sample text for token counting."
token_count = count_tokens(text)
print(f"Token count: {token_count}")
long_text = "This is a very long text that needs to be clipped..." * 100
clipped_text = clip_prompt(long_text, max_tokens=50)
print(f"Clipped text: {clipped_text}")
These utilities use the cl100k_base
encoding by default, which is suitable for most modern language models.
Logging
The package uses Python's built-in logging module. You can configure logging in your application to see debug information, warnings, and errors from the Async LLM Handler:
import logging
logging.basicConfig(level=logging.INFO)
This will display INFO level logs and above from the Async LLM Handler.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file async_llm_handler-0.2.0.tar.gz
.
File metadata
- Download URL: async_llm_handler-0.2.0.tar.gz
- Upload date:
- Size: 12.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 244f25db0316cfffe7efdcf519cd3a8cd5dcb544bffbbb64d504f0e3ba827abe |
|
MD5 | a0d19ba157361277f3e29158ed666988 |
|
BLAKE2b-256 | eb30904348605a9997982b47ec2ca7c91dedbbd38e2f272081dcce64fb78e3bd |
File details
Details for the file async_llm_handler-0.2.0-py3-none-any.whl
.
File metadata
- Download URL: async_llm_handler-0.2.0-py3-none-any.whl
- Upload date:
- Size: 11.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 32c1edd4e86862787d25ebaaa28bf1186bc26ba3b7920ef0acce29329e0f8f45 |
|
MD5 | b535a479086d881ed2ebed45a476c8c3 |
|
BLAKE2b-256 | ae2923ea6a5be4c7ab2910ec8705d5504f33845b03c013a8fdc829b55ce47eda |