Skip to main content

Scalable multi-provider LLM library with unified interface for OpenAI, Gemini, and Anthropic

Project description

LLM Library - Scalable Multi-Provider LLM Library

A production-ready, scalable multi-provider Large Language Model (LLM) library designed for definable.ai. This library provides a unified interface for multiple LLM providers including OpenAI, Gemini, and Anthropic, with support for chat completions, image generation, file processing, and advanced capabilities.

✨ Features

Core Capabilities

  • Multi-Provider Support: OpenAI, Gemini, Anthropic (extensible architecture)
  • Unified Interface: Consistent API across all providers
  • Session Management: Persistent conversation sessions with context
  • File Processing: Support for PDF, DOCX, PPTX, XLSX, images, and text files
  • Streaming Responses: Real-time streaming for chat completions
  • Rate Limiting: Built-in token bucket rate limiting
  • Retry Logic: Exponential backoff with circuit breaker patterns
  • FastAPI Integration: Production-ready REST API

Advanced Features

  • Provider Switching: Change providers mid-conversation
  • Image Processing: OCR, analysis, and multimodal support
  • Chunking: Smart text chunking for large documents
  • Error Handling: Comprehensive exception hierarchy
  • Configuration: Environment-based configuration management
  • Monitoring: Structured logging and health checks

📦 Installation

# Clone the repository
git clone <repository-url>
cd llms_lib

# Install dependencies using uv
uv sync

# Or with pip
pip install -e .

⚙️ Configuration

Create a .env file in your project root:

# API Keys
OPENAI_API_KEY=your_openai_api_key_here
GEMINI_API_KEY=your_gemini_api_key_here
ANTHROPIC_API_KEY=your_anthropic_api_key_here

# Provider Settings
DEFAULT_PROVIDER=openai
OPENAI_DEFAULT_MODEL=gpt-4-turbo-preview
OPENAI_TEMPERATURE=0.7

# Rate Limiting
RATE_LIMIT_ENABLED=true
RATE_LIMIT_REQUESTS_PER_MINUTE=60
RATE_LIMIT_TOKENS_PER_MINUTE=90000

# Session Management
SESSION_STORE_TYPE=memory  # or redis
SESSION_TTL_SECONDS=3600
REDIS_URL=redis://localhost:6379/0

# API Configuration
API_HOST=0.0.0.0
API_PORT=8000
CORS_ENABLED=true

=� Quick Start

1. Basic Chat Completion

import asyncio
from definable.llms import provider_factory

async def basic_chat():
    # Get an OpenAI provider
    provider = provider_factory.get_provider("openai")
    
    # Create a chat request
    from definable.llms.base.types import ChatRequest, Message, MessageRole
    
    messages = [
        Message(role=MessageRole.USER, content="Hello, how are you?")
    ]
    
    request = ChatRequest(messages=messages, model="gpt-4-turbo-preview")
    response = await provider.chat(request)
    
    print(response.choices[0].message.content)

# Run the example
asyncio.run(basic_chat())

2. Session-Based Conversation

import asyncio
from definable.llms import session_manager

async def session_chat():
    # Create a new session
    session = await session_manager.create_session(
        provider="openai",
        model="gpt-4-turbo-preview"
    )
    
    # Send messages in the session
    response1 = await session_manager.chat(
        session_id=session.session_id,
        message="My name is Alice. Please remember this."
    )
    print("Assistant:", response1.choices[0].message.content)
    
    response2 = await session_manager.chat(
        session_id=session.session_id,
        message="What's my name?"
    )
    print("Assistant:", response2.choices[0].message.content)

asyncio.run(session_chat())

3. File Processing

import asyncio
from definable.llms import file_processor

async def process_document():
    # Process a PDF file
    processed_file = await file_processor.process_file(
        filename="document.pdf",
        file_path="/path/to/document.pdf"
    )
    
    print(f"Extracted text length: {len(processed_file.processed_text)}")
    print(f"Number of chunks: {len(processed_file.chunks)}")
    print(f"Metadata: {processed_file.metadata}")

asyncio.run(process_document())

4. Streaming Responses

import asyncio
from definable.llms import session_manager

async def streaming_chat():
    session = await session_manager.create_session(
        provider="openai",
        model="gpt-4-turbo-preview"
    )
    
    response_stream = await session_manager.chat(
        session_id=session.session_id,
        message="Tell me a story about AI",
        stream=True
    )
    
    async for chunk in response_stream:
        if chunk.choices and chunk.choices[0].get("delta", {}).get("content"):
            print(chunk.choices[0]["delta"]["content"], end="")

asyncio.run(streaming_chat())

< FastAPI Server

Running the Server

from definable.llms.api import run_server

# Run with default settings
run_server()

# Or with custom settings
run_server(host="0.0.0.0", port=8080, reload=True)

API Endpoints

The FastAPI server provides the following endpoints:

  • Health: GET /api/v1/health - System health check
  • Providers: GET /api/v1/providers - List available providers
  • Sessions: POST /api/v1/sessions - Create conversation session
  • Chat: POST /api/v1/chat - Send chat messages
  • Files: POST /api/v1/files/process - Process uploaded files

Example API Usage

# Create a session
curl -X POST "http://localhost:8000/api/v1/sessions" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "model": "gpt-4-turbo-preview"
  }'

# Send a chat message
curl -X POST "http://localhost:8000/api/v1/chat" \
  -H "Content-Type: application/json" \
  -d '{
    "message": "Hello, world!",
    "session_id": "your-session-id"
  }'

# Process a file
curl -X POST "http://localhost:8000/api/v1/files/process" \
  -F "file=@document.pdf"

= Adding New Providers

The library is designed for easy extension. Here's how to add a new provider:

from definable.llms.base import BaseProvider, ProviderCapabilities
from definable.llms.base.types import ChatRequest, ChatResponse

class CustomProvider(BaseProvider):
    def _initialize(self, **kwargs):
        # Initialize your provider
        pass
    
    def get_capabilities(self) -> ProviderCapabilities:
        return ProviderCapabilities(
            chat=True,
            streaming=False,
            # ... other capabilities
        )
    
    async def chat(self, request: ChatRequest) -> ChatResponse:
        # Implement chat functionality
        pass
    
    async def validate_model(self, model: str) -> bool:
        # Validate model support
        pass

# Register the provider
from definable.llms import provider_factory
provider_factory.register_provider("custom", CustomProvider)

<� Architecture

The library follows a modular, plugin-based architecture:

src/libs/llms/
\x00\x00 base/              # Base classes and types
\x00\x00 providers/         # Provider implementations
\x00\x00 sessions/          # Session management
\x00\x00 processors/        # File processing
\x00\x00 utils/            # Utilities (rate limiting, retry, etc.)
\x00\x00 api/              # FastAPI integration
\x00\x00 config.py         # Configuration management

Key Components

  • BaseProvider: Abstract base class for all providers
  • SessionManager: Manages conversation sessions
  • FileProcessor: Handles document processing
  • RateLimiter: Token bucket rate limiting
  • RetryStrategy: Exponential backoff retry logic

>� Testing

# Run tests
python -m pytest tests/

# Run with coverage
python -m pytest tests/ --cov=src/libs/llms

# Run specific test category
python -m pytest tests/unit/
python -m pytest tests/integration/

=� Monitoring and Observability

The library includes comprehensive logging and monitoring:

# Configure structured logging
from definable.llms.utils import configure_logging
configure_logging(log_level="INFO", json_logs=True)

# Health checks
from definable.llms.api.routes.health import health_check
health_status = await health_check()

= Security Considerations

  • API Keys: Stored securely in environment variables
  • Rate Limiting: Prevents abuse and quota exhaustion
  • Input Validation: All inputs are validated and sanitized
  • Error Handling: Sensitive information is not exposed in errors

=� Production Deployment

Docker Deployment

FROM python:3.10-slim

WORKDIR /app
COPY . .
RUN pip install uv && uv sync

EXPOSE 8000
CMD ["python", "-m", "definable.llms.api.main"]

Environment Configuration

For production, ensure you set:

  • DEBUG=false
  • LOG_LEVEL=INFO
  • Appropriate rate limits
  • Redis for session storage
  • Proper CORS origins

=� Documentation

  • API Documentation: Available at /docs when running the server
  • Provider Guide: See docs/providers.md
  • Configuration Reference: See docs/configuration.md
  • Deployment Guide: See docs/deployment.md

> Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

=� License

This project is proprietary to definable.ai.

=� Support

For support and questions, please contact the definable.ai team.


Built with d for scalable AI applications.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

definable_llms-0.1.1.tar.gz (358.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

definable_llms-0.1.1-py3-none-any.whl (120.8 kB view details)

Uploaded Python 3

File details

Details for the file definable_llms-0.1.1.tar.gz.

File metadata

  • Download URL: definable_llms-0.1.1.tar.gz
  • Upload date:
  • Size: 358.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for definable_llms-0.1.1.tar.gz
Algorithm Hash digest
SHA256 3fd0e22fde489137b6fa2819fa39fecbb3029c4fc7dd7d509f65299890736e33
MD5 2598eb350702e5e624b003b3a27533e3
BLAKE2b-256 3e9d84145ecc95774d36003bf2ffa81e5d0dd9b901f35a5cd447a5bc5f27dcf2

See more details on using hashes here.

File details

Details for the file definable_llms-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: definable_llms-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 120.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for definable_llms-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 928b7900de5da4441981e00516ae7e1188a1eff0713408823d994c3c6cb5145a
MD5 1ad83e1685d7291d74427a163cd6d4d0
BLAKE2b-256 ae4370b145977f4bc5a42ea621d1f47ada0700b332519c0a15c93aa2595269fc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page