An asynchronous handler for multiple LLM APIs

Project description

Async LLM Handler

Async LLM Handler is a Python package that provides a unified interface for interacting with multiple Language Model APIs asynchronously. It currently supports Gemini, Claude, and OpenAI APIs.

Features

Asynchronous API calls
Support for multiple LLM providers:
- Gemini (model: gemini_flash)
- Claude (models: claude_3_5_sonnet, claude_3_haiku)
- OpenAI (models: gpt_4o, gpt_4o_mini)
Automatic rate limiting for each API
Token counting and prompt clipping utilities

Installation

Install the Async LLM Handler using pip:

pip install async-llm-handler

Configuration

Before using the package, set up your environment variables in a .env file in your project's root directory:

GEMINI_API_KEY=your_gemini_api_key
CLAUDE_API_KEY=your_claude_api_key
OPENAI_API_KEY=your_openai_api_key

Usage

Basic Usage

import asyncio
from async_llm_handler import Handler

async def main():
    handler = Handler()

    # Using the default model
    response = await handler.query("What is the capital of France?")
    print(response)

    # Specifying a model
    response = await handler.query("Explain quantum computing", model="claude_3_5_sonnet")
    print(response)

asyncio.run(main())

Advanced Usage

Using Multiple Models Concurrently

import asyncio
from async_llm_handler import Handler

async def main():
    handler = Handler()
    prompt = "Explain the theory of relativity"
    
    tasks = [
        handler.query(prompt, model='gemini_flash'),
        handler.query(prompt, model='gpt_4o'),
        handler.query(prompt, model='claude_3_5_sonnet')
    ]
    
    responses = await asyncio.gather(*tasks)
    
    for model, response in zip(['Gemini Flash', 'GPT-4o', 'Claude 3.5 Sonnet'], responses):
        print(f"Response from {model}:")
        print(response)
        print()

asyncio.run(main())

Limiting Input and Output Tokens

import asyncio
from async_llm_handler import Handler

async def main():
    handler = Handler()

    long_prompt = "Provide a detailed explanation of the entire history of artificial intelligence, including all major milestones and breakthroughs."

    response = await handler.query(long_prompt, model="gpt_4o", max_input_tokens=1000, max_output_tokens=500)
    print(response)

asyncio.run(main())

Supported Models

The package supports the following models:

Gemini:
- gemini_flash
Claude:
- claude_3_5_sonnet
- claude_3_haiku
OpenAI:
- gpt_4o
- gpt_4o_mini

You can specify these models using the model parameter in the query method.

Error Handling

The package uses custom exceptions for error handling. Wrap your API calls in try-except blocks to handle potential errors:

import asyncio
from async_llm_handler import Handler
from async_llm_handler.exceptions import LLMAPIError, RateLimitTimeoutError

async def main():
    handler = Handler()

    try:
        response = await handler.query("What is the meaning of life?", model="gpt_4o")
        print(response)
    except LLMAPIError as e:
        print(f"An API error occurred: {e}")
    except RateLimitTimeoutError as e:
        print(f"Rate limit exceeded: {e}")

asyncio.run(main())

Rate Limiting

The package automatically handles rate limiting for each API. The current rate limits are:

Gemini Flash: 30 requests per minute
Claude 3.5 Sonnet: 5 requests per minute
Claude 3 Haiku: 5 requests per minute
GPT-4o: 5 requests per minute
GPT-4o mini: 5 requests per minute

If you exceed these limits, the package will automatically wait before making the next request.

Utility Functions

The package includes utility functions for token counting and prompt clipping:

from async_llm_handler.utils import count_tokens, clip_prompt

text = "This is a sample text for token counting."
token_count = count_tokens(text)
print(f"Token count: {token_count}")

long_text = "This is a very long text that needs to be clipped..." * 100
clipped_text = clip_prompt(long_text, max_tokens=50)
print(f"Clipped text: {clipped_text}")

These utilities use the cl100k_base encoding by default, which is suitable for most modern language models.

Logging

The package uses Python's built-in logging module. You can configure logging in your application to see debug information, warnings, and errors from the Async LLM Handler:

import logging

logging.basicConfig(level=logging.INFO)

This will display INFO level logs and above from the Async LLM Handler.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License.

Project details

Release history Release notifications | RSS feed

This version

0.2.0

Aug 3, 2024

0.1.0

Jul 20, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

async_llm_handler-0.2.0.tar.gz (12.3 kB view details)

Uploaded Aug 3, 2024 Source

Built Distribution

async_llm_handler-0.2.0-py3-none-any.whl (11.6 kB view details)

Uploaded Aug 3, 2024 Python 3

File details

Details for the file async_llm_handler-0.2.0.tar.gz.

File metadata

Download URL: async_llm_handler-0.2.0.tar.gz
Upload date: Aug 3, 2024
Size: 12.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.11.2

File hashes

Hashes for async_llm_handler-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`244f25db0316cfffe7efdcf519cd3a8cd5dcb544bffbbb64d504f0e3ba827abe`
MD5	`a0d19ba157361277f3e29158ed666988`
BLAKE2b-256	`eb30904348605a9997982b47ec2ca7c91dedbbd38e2f272081dcce64fb78e3bd`

See more details on using hashes here.

File details

Details for the file async_llm_handler-0.2.0-py3-none-any.whl.

File metadata

Download URL: async_llm_handler-0.2.0-py3-none-any.whl
Upload date: Aug 3, 2024
Size: 11.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.11.2

File hashes

Hashes for async_llm_handler-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`32c1edd4e86862787d25ebaaa28bf1186bc26ba3b7920ef0acce29329e0f8f45`
MD5	`b535a479086d881ed2ebed45a476c8c3`
BLAKE2b-256	`ae2923ea6a5be4c7ab2910ec8705d5504f33845b03c013a8fdc829b55ce47eda`