Minimal async LLM backend with caching and batch execution
Project description
minima-llm
Minimal async LLM backend with caching and batch execution.
Features
- Zero Dependencies: Core package uses only Python stdlib (asyncio, urllib, sqlite3)
- SQLite Cache: Automatic prompt caching with WAL mode for multi-process safety
- Batch Execution: Worker pool pattern with heartbeat, failure tracking, and early abort
- Rate Limiting: RPM pacing with server-learned limits from rate limit headers
- Retry Logic: Exponential backoff with jitter, cooldown after overload
- OpenAI Compatible: Works with any OpenAI-compatible endpoint
- DSPy Integration: Optional adapter for DSPy framework (requires
[dspy]extra)
Installation
# Core only (no dependencies)
pip install minima-llm
# With DSPy support
pip install minima-llm[dspy]
# With YAML config support
pip install minima-llm[yaml]
# Development
pip install minima-llm[dev]
Quick Start
Basic Usage
import asyncio
from minima_llm import MinimaLlmConfig, OpenAIMinimaLlm, MinimaLlmRequest
async def main():
# Configure from environment or explicit values
config = MinimaLlmConfig(
base_url="https://api.openai.com/v1",
model="gpt-4",
api_key="sk-...",
cache_dir="./cache",
)
backend = OpenAIMinimaLlm(config)
# Single request
request = MinimaLlmRequest(
request_id="q1",
messages=[{"role": "user", "content": "What is 2+2?"}],
temperature=0.0,
)
result = await backend.generate(request)
print(result.text)
await backend.aclose()
asyncio.run(main())
Batch Execution
import asyncio
from minima_llm import MinimaLlmConfig, OpenAIMinimaLlm, MinimaLlmRequest
async def main():
config = MinimaLlmConfig.from_env()
backend = OpenAIMinimaLlm(config)
requests = [
MinimaLlmRequest(
request_id=f"q{i}",
messages=[{"role": "user", "content": f"Question {i}"}],
)
for i in range(100)
]
# Run batch with progress heartbeat
results = await backend.run_batched(requests)
for r in results:
if hasattr(r, 'text'):
print(f"{r.request_id}: {r.text[:50]}...")
await backend.aclose()
asyncio.run(main())
With DSPy
import asyncio
import dspy
from minima_llm import MinimaLlmConfig, OpenAIMinimaLlm
from minima_llm.dspy_adapter import MinimaLlmDSPyLM
class QA(dspy.Signature):
question = dspy.InputField()
answer = dspy.OutputField()
async def main():
config = MinimaLlmConfig.from_env()
backend = OpenAIMinimaLlm(config)
lm = MinimaLlmDSPyLM(backend)
dspy.configure(lm=lm)
predictor = dspy.ChainOfThought(QA)
result = await predictor.acall(question="What is the capital of France?")
print(result.answer)
await backend.aclose()
asyncio.run(main())
Configuration
Environment Variables
| Variable | Description | Default |
|---|---|---|
OPENAI_BASE_URL |
API endpoint URL | (required) |
OPENAI_MODEL |
Model identifier | (required) |
OPENAI_API_KEY |
API key | None |
CACHE_DIR |
SQLite cache directory | None (disabled) |
BATCH_NUM_WORKERS |
Concurrent workers | 64 |
MAX_OUTSTANDING |
Max in-flight HTTP requests | 32 |
RPM |
Requests per minute (0=unlimited) | 600 |
TIMEOUT_S |
Per-request timeout | 60.0 |
MAX_ATTEMPTS |
Max retry attempts (0=infinite) | 6 |
YAML Configuration
base_url: "https://api.openai.com/v1"
model: "gpt-4"
api_key: "sk-..."
cache_dir: "./cache"
# Optional batch settings
batch:
num_workers: 64
max_failures: 25
heartbeat_s: 10.0
Load with:
config = MinimaLlmConfig.from_yaml("config.yml")
Architecture
minima_llm/
├── protocol.py # AsyncMinimaLlmBackend protocol, Request/Response types
├── config.py # MinimaLlmConfig, BatchConfig, ParasailBatchConfig
├── backend.py # OpenAIMinimaLlm - full async backend with cache
├── batch.py # run_batched_callable, Parasail batch support
└── dspy_adapter.py # MinimaLlmDSPyLM, TolerantChatAdapter (optional)
Multi-Loop Support
The backend is designed to be reused across multiple asyncio.run() calls:
backend = OpenAIMinimaLlm(config)
# First asyncio.run()
asyncio.run(batch1(backend))
# Second asyncio.run() - works correctly
asyncio.run(batch2(backend))
This is achieved through lazy per-loop initialization of async primitives.
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
minima_llm-0.1.0.tar.gz
(66.0 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file minima_llm-0.1.0.tar.gz.
File metadata
- Download URL: minima_llm-0.1.0.tar.gz
- Upload date:
- Size: 66.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
530913ba82bafee083601c2f17570f6391a5388adc882f6ad452a6b52ee8309a
|
|
| MD5 |
fe2f81b54fcaa88973d879773a6eb7c2
|
|
| BLAKE2b-256 |
91dc277dab3873800541fd4a022fb791c649d43e1ec1d526b3923a25efb06cd7
|
File details
Details for the file minima_llm-0.1.0-py3-none-any.whl.
File metadata
- Download URL: minima_llm-0.1.0-py3-none-any.whl
- Upload date:
- Size: 39.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d1c7dec1c2914e20d6e4ff1ebca264195e3de642673b2bc733f0748e7ee5d3f9
|
|
| MD5 |
17edb016fc51ca034c57ef083d1d6d82
|
|
| BLAKE2b-256 |
3da519c1bc5dca2f2ca91cb777e24afaf85936fb3f1d3da85ed3c49d676e9451
|