Scalable multi-provider LLM library with unified interface for OpenAI, Gemini, Anthropic, Deepseek, Moonshot and xAI
Project description
LLM Library - Scalable Multi-Provider LLM Library
A production-ready, scalable multi-provider Large Language Model (LLM) library designed for definable.ai. This library provides a unified interface for multiple LLM providers including OpenAI, Gemini, and Anthropic, with support for chat completions, image generation, file processing, and advanced capabilities.
✨ Features
Core Capabilities
- Multi-Provider Support: OpenAI, Gemini, Anthropic (extensible architecture)
- Unified Interface: Consistent API across all providers
- Session Management: Persistent conversation sessions with context
- File Processing: Support for PDF, DOCX, PPTX, XLSX, images, and text files
- Streaming Responses: Real-time streaming for chat completions
- Rate Limiting: Built-in token bucket rate limiting
- Retry Logic: Exponential backoff with circuit breaker patterns
- FastAPI Integration: Production-ready REST API
Advanced Features
- Provider Switching: Change providers mid-conversation
- Image Processing: OCR, analysis, and multimodal support
- Chunking: Smart text chunking for large documents
- Error Handling: Comprehensive exception hierarchy
- Configuration: Environment-based configuration management
- Monitoring: Structured logging and health checks
📦 Installation
# Clone the repository
git clone <repository-url>
cd llms_lib
# Install dependencies using uv
uv sync
# Or with pip
pip install -e .
⚙️ Configuration
Create a .env file in your project root:
# API Keys
OPENAI_API_KEY=your_openai_api_key_here
GEMINI_API_KEY=your_gemini_api_key_here
ANTHROPIC_API_KEY=your_anthropic_api_key_here
# Provider Settings
DEFAULT_PROVIDER=openai
OPENAI_DEFAULT_MODEL=gpt-4-turbo-preview
OPENAI_TEMPERATURE=0.7
# Rate Limiting
RATE_LIMIT_ENABLED=true
RATE_LIMIT_REQUESTS_PER_MINUTE=60
RATE_LIMIT_TOKENS_PER_MINUTE=90000
# Session Management
SESSION_STORE_TYPE=memory # or redis
SESSION_TTL_SECONDS=3600
REDIS_URL=redis://localhost:6379/0
# API Configuration
API_HOST=0.0.0.0
API_PORT=8000
CORS_ENABLED=true
🚀 Quick Start
1. Basic Chat Completion
import asyncio
from definable.llms import provider_factory
async def basic_chat():
# Get an OpenAI provider
provider = provider_factory.get_provider("openai")
# Create a chat request
from definable.llms.base.types import ChatRequest, Message, MessageRole
messages = [
Message(role=MessageRole.USER, content="Hello, how are you?")
]
request = ChatRequest(messages=messages, model="gpt-4-turbo-preview")
response = await provider.chat(request)
print(response.choices[0].message.content)
# Run the example
asyncio.run(basic_chat())
2. Session-Based Conversation
import asyncio
from definable.llms import session_manager
async def session_chat():
# Create a new session
session = await session_manager.create_session(
provider="openai",
model="gpt-4-turbo-preview"
)
# Send messages in the session
response1 = await session_manager.chat(
session_id=session.session_id,
message="My name is Alice. Please remember this."
)
print("Assistant:", response1.choices[0].message.content)
response2 = await session_manager.chat(
session_id=session.session_id,
message="What's my name?"
)
print("Assistant:", response2.choices[0].message.content)
asyncio.run(session_chat())
3. File Processing
import asyncio
from definable.llms import file_processor
async def process_document():
# Process a PDF file
processed_file = await file_processor.process_file(
filename="document.pdf",
file_path="/path/to/document.pdf"
)
print(f"Extracted text length: {len(processed_file.processed_text)}")
print(f"Number of chunks: {len(processed_file.chunks)}")
print(f"Metadata: {processed_file.metadata}")
asyncio.run(process_document())
4. Streaming Responses
import asyncio
from definable.llms import session_manager
async def streaming_chat():
session = await session_manager.create_session(
provider="openai",
model="gpt-4-turbo-preview"
)
response_stream = await session_manager.chat(
session_id=session.session_id,
message="Tell me a story about AI",
stream=True
)
async for chunk in response_stream:
if chunk.choices and chunk.choices[0].get("delta", {}).get("content"):
print(chunk.choices[0]["delta"]["content"], end="")
asyncio.run(streaming_chat())
🌐 FastAPI Server
Running the Server
from definable.llms.api import run_server
# Run with default settings
run_server()
# Or with custom settings
run_server(host="0.0.0.0", port=8080, reload=True)
API Endpoints
The FastAPI server provides the following endpoints:
- Health:
GET /api/v1/health- System health check - Providers:
GET /api/v1/providers- List available providers - Sessions:
POST /api/v1/sessions- Create conversation session - Chat:
POST /api/v1/chat- Send chat messages - Files:
POST /api/v1/files/process- Process uploaded files
Example API Usage
# Create a session
curl -X POST "http://localhost:8000/api/v1/sessions" \
-H "Content-Type: application/json" \
-d '{
"provider": "openai",
"model": "gpt-4-turbo-preview"
}'
# Send a chat message
curl -X POST "http://localhost:8000/api/v1/chat" \
-H "Content-Type: application/json" \
-d '{
"message": "Hello, world!",
"session_id": "your-session-id"
}'
# Process a file
curl -X POST "http://localhost:8000/api/v1/files/process" \
-F "file=@document.pdf"
🔌 Adding New Providers
The library is designed for easy extension. Here's how to add a new provider:
from definable.llms.base import BaseProvider, ProviderCapabilities
from definable.llms.base.types import ChatRequest, ChatResponse
class CustomProvider(BaseProvider):
def _initialize(self, **kwargs):
# Initialize your provider
pass
def get_capabilities(self) -> ProviderCapabilities:
return ProviderCapabilities(
chat=True,
streaming=False,
# ... other capabilities
)
async def chat(self, request: ChatRequest) -> ChatResponse:
# Implement chat functionality
pass
async def validate_model(self, model: str) -> bool:
# Validate model support
pass
# Register the provider
from definable.llms import provider_factory
provider_factory.register_provider("custom", CustomProvider)
🏗️ Architecture
The library follows a modular, plugin-based architecture:
src/libs/llms/
├── base/ # Base classes and types
├── providers/ # Provider implementations
├── sessions/ # Session management
├── processors/ # File processing
├── utils/ # Utilities (rate limiting, retry, etc.)
├── api/ # FastAPI integration
└── config.py # Configuration management
Key Components
- BaseProvider: Abstract base class for all providers
- SessionManager: Manages conversation sessions
- FileProcessor: Handles document processing
- RateLimiter: Token bucket rate limiting
- RetryStrategy: Exponential backoff retry logic
🧪 Testing
# Run tests
python -m pytest tests/
# Run with coverage
python -m pytest tests/ --cov=src/libs/llms
# Run specific test category
python -m pytest tests/unit/
python -m pytest tests/integration/
📊 Monitoring and Observability
The library includes comprehensive logging and monitoring:
# Configure structured logging
from definable.llms.utils import configure_logging
configure_logging(log_level="INFO", json_logs=True)
# Health checks
from definable.llms.api.routes.health import health_check
health_status = await health_check()
🔒 Security Considerations
- API Keys: Stored securely in environment variables
- Rate Limiting: Prevents abuse and quota exhaustion
- Input Validation: All inputs are validated and sanitized
- Error Handling: Sensitive information is not exposed in errors
🚢 Production Deployment
Docker Deployment
FROM python:3.10-slim
WORKDIR /app
COPY . .
RUN pip install uv && uv sync
EXPOSE 8000
CMD ["python", "-m", "definable.llms.api.main"]
Environment Configuration
For production, ensure you set:
DEBUG=falseLOG_LEVEL=INFO- Appropriate rate limits
- Redis for session storage
- Proper CORS origins
📚 Documentation
- API Documentation: Available at
/docswhen running the server - Provider Guide: See
docs/providers.md - Configuration Reference: See
docs/configuration.md - Deployment Guide: See
docs/deployment.md
🤝 Contributing
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
📄 License
This project is proprietary to definable.ai.
💬 Support
For support and questions, please contact the definable.ai team.
Built with ❤️ for scalable AI applications.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file definable_llms-0.1.10.tar.gz.
File metadata
- Download URL: definable_llms-0.1.10.tar.gz
- Upload date:
- Size: 372.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a7bcd7ffd99b5acbe067f1e1b6ae8fdf2d8777baa16c0c7b87576e8ceffd4689
|
|
| MD5 |
dbc266f9f7347976fb7256ddab85d6ed
|
|
| BLAKE2b-256 |
5a2a82e9b6ee531e326cc85f4c185eacc4e2e13a10924f4b790f8df633255670
|
File details
Details for the file definable_llms-0.1.10-py3-none-any.whl.
File metadata
- Download URL: definable_llms-0.1.10-py3-none-any.whl
- Upload date:
- Size: 140.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7a2effc4b98ae38fba1208f07e38f1202a23ea96057052b1e3b879d26be89262
|
|
| MD5 |
4806fcaab86c04b1988c85d97197ee75
|
|
| BLAKE2b-256 |
ad8cde5c542fc2b56fc0f82801403f1f2f89a8d4a7ac1ab134a17303f16a1a4f
|