A lightweight, extensible server for working with large language models, focused on pipeline processing and multi-modal capabilities. Built with FastAPI and DSPy.
Project description
A Simple LLM Kit
A production-ready Python library for building LLM-powered applications with advanced pipeline processing, multi-modal capabilities, and enterprise-grade reliability features. Built with FastAPI, DSPy, and a composable architecture.
This is a framework/library, not a standalone server. Use it as a dependency in your own FastAPI applications.
🚀 Key Features
Core Architecture
- Pipeline-First Design: Composable, type-safe pipeline steps for complex processing workflows.
- Protocol-Based Framework: Clean interfaces enabling easy extension and testing.
- Multi-Modal Processing: Unified handling of text, images, and structured data.
- Performance Tracking: Comprehensive, per-request metrics collection with step-by-step timing.
Reliability & Observability
- Circuit Breaker Pattern: Built-in failure protection with automatic recovery.
- OpenTelemetry Integration: Vendor-neutral metrics and tracing for any backend (Prometheus, Datadog, etc.).
- Semantic Conventions: Adheres to
llm.*OTel conventions for out-of-the-box compatibility with observability tools. - Structured Logging: JSON-formatted logs with context preservation.
Model & Provider Support
- Multi-Provider: OpenAI, Anthropic, Google Gemini, and Hugging Face.
- Flexible Configuration: YAML-based model configuration with parameter overrides.
- Token Management: Robust and accurate token counting with cost estimation.
- Program Versioning: DSPy program management with optimization tracking.
Specialized Capabilities
- Image Processing: Intelligent resizing, format conversion, and optimization.
- Type Safety: Full Pydantic integration with runtime protocol checking.
- Custom Extensions: Easy-to-implement custom pipeline steps and model backends.
🏗️ Architecture Overview
The framework is built around a composable pipeline architecture where each step implements the PipelineStep protocol:
# Core pipeline concept
from a_simple_llm_kit.core.pipeline import Pipeline
from a_simple_llm_kit.core.implementations import ImageProcessor, ModelProcessor
from a_simple_llm_kit.core.types import MediaType
Pipeline([
ImageProcessor(max_size=(800, 800)), # Resize and optimize images
ModelProcessor(backend, [MediaType.IMAGE]), # Send to vision model
OutputProcessor() # Format response
])
Key Components
- Core Framework (
src/a_simple_llm_kit/core/): Protocols, types, and base implementations. - Model Management (
src/a_simple_llm_kit/models/): Provider abstraction and program management. - Pipeline System: Composable processing steps with automatic validation.
- Metrics & Monitoring: Performance tracking, circuit breakers, and observability.
💻 Installation & Basic Usage
Installation
# Install from PyPI
pip install a-simple-llm-kit
# Or install from source for development
git clone https://github.com/chuckfinca/a-simple-llm-kit
cd a-simple-llm-kit
pip install -e ".[dev]"
Basic Usage Example
Here's how to build a simple FastAPI application that uses the framework to provide a text completion endpoint with full performance tracking.
Your Application's main.py:
import time
from fastapi import FastAPI, HTTPException
from contextlib import asynccontextmanager
from pydantic import BaseModel
import dspy
# 1. Import the framework's core components
from a_simple_llm_kit.core.config import FrameworkSettings
from a_simple_llm_kit.core.metrics_wrappers import PerformanceMetrics
from a_simple_llm_kit.core.pipeline import Pipeline
from a_simple_llm_kit.core.implementations import ModelProcessor
from a_simple_llm_kit.core.output_processors import DefaultOutputProcessor
from a_simple_llm_kit.core.types import MediaType, PipelineData
from a_simple_llm_kit.core.utils import MetadataCollector
from a_simple_llm_kit.defaults import YamlConfigProvider
from a_simple_llm_kit.core.storage import FileSystemStorageAdapter
from a_simple_llm_kit.models.manager import ModelManager
from a_simple_llm_kit.models.predictor import Predictor
from a_simple_llm_kit.models.program_manager import ProgramManager
class PredictionRequest(BaseModel):
prompt: str
model_id: str = "gpt-4o-mini"
@asynccontextmanager
async def lifespan(app: FastAPI):
# 2. Configure and instantiate the framework managers
settings = FrameworkSettings()
config_provider = YamlConfigProvider("config/model_config.yml")
storage_adapter = FileSystemStorageAdapter(base_dir="dspy_programs")
model_manager = ModelManager(config_provider=config_provider, settings=settings)
program_manager = ProgramManager(model_manager=model_manager, storage_adapter=storage_adapter)
# 3. Register your application's DSPy programs
program_manager.register_program(program_class=Predictor, name="Text Completion")
# 4. Make managers available to your API routes
app.state.program_manager = program_manager
yield
app.state.program_manager = None
app = FastAPI(lifespan=lifespan)
@app.post("/predict")
async def predict_text(request: PredictionRequest):
"""A modern endpoint using the pipeline and automatic metadata."""
try:
program_manager: ProgramManager = app.state.program_manager
metrics = PerformanceMetrics() # 1. Start performance tracking
# 2. Define the processing pipeline
pipeline = Pipeline([
ModelProcessor(
model_manager=program_manager.model_manager,
model_id=request.model_id,
signature_class=Predictor,
input_key="input",
output_processor=DefaultOutputProcessor(),
accepted_types=[MediaType.TEXT],
output_type=MediaType.TEXT,
)
])
# 3. Execute the pipeline
result_data = await pipeline.execute(
PipelineData(media_type=MediaType.TEXT, content=request.prompt)
)
metrics.mark_checkpoint("pipeline_complete")
# 4. Automatically collect all consistent metadata
final_metadata = MetadataCollector.collect_response_metadata(
model_id=request.model_id,
program_metadata=program_manager.registry.get_program_metadata("predictor"),
performance_metrics=metrics.get_summary(),
model_info=program_manager.model_info.get(request.model_id, {}).model_dump(by_alias=True) if program_manager.model_info.get(request.model_id) else {}
)
return {
"success": True,
"data": {"response": result_data.content},
"metadata": final_metadata,
}
except Exception as e:
# Proper error handling
raise HTTPException(status_code=500, detail=str(e))
Required Configuration
config/model_config.yml:
models:
gpt-4o-mini:
model_name: "openai/gpt-4o-mini"
max_tokens: 3000
additional_params:
timeout: 60
claude-3-5-sonnet:
model_name: "anthropic/claude-3-5-sonnet-20241022"
max_tokens: 4000
additional_params:
timeout: 60
Meta-Llama-3.1-8B-Instruct:
model_name: "huggingface/meta-llama/Meta-Llama-3.1-8B-Instruct"
max_tokens: 3000
additional_params:
timeout: 60
gemini-2.0-flash:
model_name: "gemini/gemini-2.0-flash"
max_tokens: 2048
additional_params:
timeout: 60
Environment Variables (.env):
OPENAI_API_KEY=your_openai_key_here
ANTHROPIC_API_KEY=your_anthropic_key_here
HUGGINGFACE_API_KEY=your_hf_key_here
GEMINI_API_KEY=your_gemini_key_here
Running Your Application
# Run your FastAPI application
uvicorn main:app --reload
# Test your custom endpoint
curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{"prompt": "Hello, world!", "model_id": "gpt-4o-mini"}'
🔭 Observability with OpenTelemetry
The framework is deeply instrumented with OpenTelemetry to provide vendor-neutral metrics and traces, giving you immediate insight into your application's performance.
Enabling Observability
1. Install the optional dependencies:
pip install "a-simple-llm-kit[opentelemetry]"
2. Enable via Environment Variables:
Create a .env file in your application's root directory:
# --- Enable OTel ---
OTEL_ENABLED=true
OTEL_SERVICE_NAME="MyLLMApp"
OTEL_SERVICE_VERSION="1.0.0"
# --- Your API Keys ---
OPENAI_API_KEY=your_openai_key_here
ANTHROPIC_API_KEY=your_anthropic_key_here
HUGGINGFACE_API_KEY=your_hf_key_here
GEMINI_API_KEY=your_gemini_key_here
3. Configure the OTel SDK in Your Application:
The library emits signals, but your application is responsible for configuring an "exporter" to send them to a backend. Here is an example of setting up a Prometheus exporter in your main.py:
# In your main.py
from fastapi import FastAPI
from contextlib import asynccontextmanager
# --- OTel Imports ---
from opentelemetry import metrics
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.resources import Resource
from opentelemetry.exporter.prometheus import PrometheusMetricReader
from prometheus_client import make_asgi_app
# --- End OTel Imports ---
# Import your settings to get service name
from a_simple_llm_kit.core.config import FrameworkSettings
@asynccontextmanager
async def lifespan(app: FastAPI):
settings = FrameworkSettings()
# --- OTel SDK Setup ---
if settings.otel_enabled:
resource = Resource.create({
"service.name": settings.otel_service_name,
"service.version": settings.otel_service_version
})
reader = PrometheusMetricReader()
provider = MeterProvider(resource=resource, metric_readers=[reader])
metrics.set_meter_provider(provider)
# --- End OTel SDK Setup ---
# ... your existing lifespan logic for managers ...
yield
# Create the main FastAPI app
app = FastAPI(lifespan=lifespan)
# Create and mount the Prometheus metrics endpoint
settings = FrameworkSettings()
if settings.otel_enabled:
metrics_app = make_asgi_app()
app.mount("/metrics", metrics_app)
# ... your API routes (@app.post("/predict"), etc.) ...
Available Instrumentation
- Automatic Tracing: Key methods like
ModelBackend.predictand each step in a Pipeline are automatically wrapped in trace spans with rich, LLM-specific attributes. - Automatic Metrics: The framework emits metrics for model calls, circuit breaker failures and state changes, and overall request latency.
- Rich Per-Request Data: The
PerformanceMetricsobject, accessible in your API response, provides a detailed breakdown of timing and token usage for debugging individual requests.
📊 Response Format
All API responses follow a consistent envelope with comprehensive metadata:
{
"success": true,
"data": {
"response": "Model response content"
},
"metadata": {
"program": {
"id": "predictor",
"version": "1.0.0",
"name": "Predictor"
},
"model": {
"id": "gpt-4o-mini",
"provider": "openai",
"baseModel": "gpt-4o-mini"
},
"performance": {
"timing": {
"totalMs": 750.25,
"modelCompleteMs": 738.91
},
"tokens": {
"inputTokens": 50,
"outputTokens": 150,
"totalTokens": 200,
"costUsd": 0.0001
},
"traceId": "unique-trace-identifier"
},
"executionId": "unique-execution-id",
"timestamp": "2025-08-01T10:30:00.123456Z"
}
}
🔧 Building Custom Pipelines
The framework's strength lies in its composable pipeline architecture. Create custom processing steps by implementing the PipelineStep protocol:
Custom Pipeline Step
from a_simple_llm_kit.core.protocols import PipelineStep
from a_simple_llm_kit.core.types import MediaType, PipelineData
class TextSummarizerStep(PipelineStep):
def __init__(self, max_length: int = 100):
self.max_length = max_length
@property
def accepted_media_types(self) -> list[MediaType]:
return [MediaType.TEXT]
async def process(self, data: PipelineData) -> PipelineData:
# Custom processing logic
text = data.content
if len(text) > self.max_length:
summary = text[:self.max_length] + "..."
else:
summary = text
return PipelineData(
media_type=MediaType.TEXT,
content=summary,
metadata={
**data.metadata,
"original_length": len(text),
"summarized": True
}
)
Custom Model Backend
from a_simple_llm_kit.core.protocols import ModelBackend
from a_simple_llm_kit.core.model_interfaces import ModelOutput
from a_simple_llm_kit.core.types import PipelineData
from typing import Any
class CustomModelBackend(ModelBackend):
def __init__(self, model_id: str):
self.model_id = model_id
self.program_metadata = None
self.last_prompt_tokens = None
self.last_completion_tokens = None
async def predict(self, input: Any, pipeline_data: PipelineData) -> ModelOutput:
# Your custom model logic here
result = await your_model_call(input)
return result
def get_lm_history(self) -> list[Any]:
return [] # Return model interaction history
Combining Custom Components
from a_simple_llm_kit.core.pipeline import Pipeline
from a_simple_llm_kit.core.implementations import ModelProcessor
# Create a custom pipeline
custom_pipeline = Pipeline([
TextSummarizerStep(max_length=200),
ModelProcessor(
model_manager=your_model_manager,
model_id="my-model",
signature_class=YourSignature,
input_key="input",
output_processor=DefaultOutputProcessor(),
accepted_types=[MediaType.TEXT],
output_type=MediaType.TEXT
)
])
# Execute the pipeline
result = await custom_pipeline.execute(initial_data)
⚙️ Configuration
Model Configuration (config/model_config.yml)
models:
gpt-4o-mini:
model_name: "openai/gpt-4o-mini"
max_tokens: 3000
additional_params:
temperature: 0.7
top_p: 1.0
claude-3-5-sonnet:
model_name: "anthropic/claude-3-5-sonnet-20241022"
max_tokens: 4000
additional_params:
temperature: 0.8
Meta-Llama-3.1-8B-Instruct:
model_name: "huggingface/meta-llama/Meta-Llama-3.1-8B-Instruct"
max_tokens: 3000
additional_params: {}
gemini-2.0-flash:
model_name: "gemini/gemini-2.0-flash"
max_tokens: 2048
additional_params:
temperature: 0.9
Environment Variables
# Required API Keys
OPENAI_API_KEY=your_openai_key
ANTHROPIC_API_KEY=your_anthropic_key
HUGGINGFACE_API_KEY=your_hf_key
GEMINI_API_KEY=your_gemini_key
# Optional: Custom config path
LLM_CONFIG_PATH=config/model_config.yml
# OpenTelemetry Configuration
OTEL_ENABLED=true
OTEL_SERVICE_NAME="MyLLMApp"
OTEL_SERVICE_VERSION="1.0.0"
🧪 Testing
Run All Tests
# Install test dependencies
pip install -e ".[dev]"
# Run the full test suite
pytest tests/
# Run with coverage
pytest tests/ --cov=a-simple-llm-kit --cov-report=html
Test Categories
- Unit Tests: Core protocol and implementation testing
- Integration Tests: End-to-end pipeline validation
- Performance Tests: Circuit breaker and metrics validation
Example Test
import pytest
from a_simple_llm_kit.core.types import MediaType, PipelineData
from a_simple_llm_kit.core.pipeline import Pipeline
@pytest.mark.anyio
async def test_custom_pipeline():
"""Test custom pipeline with multiple steps"""
text_data = PipelineData(
media_type=MediaType.TEXT,
content="test content",
metadata={}
)
pipeline = Pipeline([
TextSummarizerStep(max_length=50),
CustomProcessorStep()
])
result = await pipeline.execute(text_data)
assert result.metadata["summarized"] is True
🤝 Contributing
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Run tests:
pytest tests/ - Run linting:
ruff format . && ruff check . - Commit changes:
git commit -m 'Add amazing feature' - Push to branch:
git push origin feature/amazing-feature - Open a Pull Request
Development Guidelines
- Follow the protocol-based architecture patterns
- Add comprehensive tests for new features
- Update documentation for API changes
- Use type hints and maintain type safety
- Follow the existing code style (Ruff configuration)
📄 License
MIT License - see LICENSE file for details.
🆘 Support
- Documentation: Check the inline code documentation
- Issues: Open GitHub issues for bugs and feature requests
- Discussions: Use GitHub Discussions for questions and ideas
Built with ❤️ using FastAPI, DSPy, and modern Python patterns
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file a_simple_llm_kit-0.3.0.tar.gz.
File metadata
- Download URL: a_simple_llm_kit-0.3.0.tar.gz
- Upload date:
- Size: 36.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ad7f7184eea2818ff3da10075a1fd5a26b3af180ad27b22bf80744066d3df48b
|
|
| MD5 |
fead1b6b3ecaea04fbae1e3d7c31a51f
|
|
| BLAKE2b-256 |
6d66f6cc65be90df5879b4e0745413c027989f988df440974f5ecae6bcfdc730
|
File details
Details for the file a_simple_llm_kit-0.3.0-py3-none-any.whl.
File metadata
- Download URL: a_simple_llm_kit-0.3.0-py3-none-any.whl
- Upload date:
- Size: 38.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c1270e46ac0f58c66f3c6c692bb2dccf6c537f970d5db489372c8d900e55da7f
|
|
| MD5 |
9bd910431f20ed1aa99bad10fefd48a9
|
|
| BLAKE2b-256 |
a6b77bb946779ca8e1969a499630b70ba5ffe6dfacb489fc339a5130e24a65f8
|