Minimal async LLM backend with caching and batch execution
Project description
minima-llm
Minimal async LLM backend with caching and batch execution.
Features
- Zero Dependencies: Core package uses only Python stdlib (asyncio, urllib, sqlite3)
- SQLite Cache: Automatic prompt caching with WAL mode for multi-process safety
- Batch Execution: Worker pool pattern with heartbeat, failure tracking, and early abort
- Rate Limiting: RPM pacing with server-learned limits from rate limit headers
- Retry Logic: Exponential backoff with jitter, cooldown after overload
- OpenAI Compatible: Works with any OpenAI-compatible endpoint
- DSPy Integration: Optional adapter for DSPy framework (requires
[dspy]extra)
Installation
# Core only (no dependencies)
pip install minima-llm
# With DSPy support
pip install minima-llm[dspy]
# With YAML config support
pip install minima-llm[yaml]
# Development
pip install minima-llm[dev]
Quick Start
Basic Usage
import asyncio
from minima_llm import MinimaLlmConfig, OpenAIMinimaLlm, MinimaLlmRequest
async def main():
# Configure from environment or explicit values
config = MinimaLlmConfig(
base_url="https://api.openai.com/v1",
model="gpt-4",
api_key="sk-...",
cache_dir="./cache",
)
backend = OpenAIMinimaLlm(config)
# Single request
request = MinimaLlmRequest(
request_id="q1",
messages=[{"role": "user", "content": "What is 2+2?"}],
temperature=0.0,
)
result = await backend.generate(request)
print(result.text)
await backend.aclose()
asyncio.run(main())
Batch Execution
import asyncio
from minima_llm import MinimaLlmConfig, OpenAIMinimaLlm, MinimaLlmRequest
async def main():
config = MinimaLlmConfig.from_env()
backend = OpenAIMinimaLlm(config)
requests = [
MinimaLlmRequest(
request_id=f"q{i}",
messages=[{"role": "user", "content": f"Question {i}"}],
)
for i in range(100)
]
# Run batch with progress heartbeat
results = await backend.run_batched(requests)
for r in results:
if hasattr(r, 'text'):
print(f"{r.request_id}: {r.text[:50]}...")
await backend.aclose()
asyncio.run(main())
With DSPy
import asyncio
import dspy
from minima_llm import MinimaLlmConfig, OpenAIMinimaLlm
from minima_llm.dspy_adapter import MinimaLlmDSPyLM
class QA(dspy.Signature):
question = dspy.InputField()
answer = dspy.OutputField()
async def main():
config = MinimaLlmConfig.from_env()
backend = OpenAIMinimaLlm(config)
lm = MinimaLlmDSPyLM(backend)
dspy.configure(lm=lm)
predictor = dspy.ChainOfThought(QA)
result = await predictor.acall(question="What is the capital of France?")
print(result.answer)
await backend.aclose()
asyncio.run(main())
Batch Management
For long-running batch jobs using the OpenAI batch API, minima-llm provides batch state management with local state files for resumption after interruption.
Configuration
Enable Parasail batch mode in your config:
parasail:
llm_batch_prefix: "my-project" # Prefix for batch state files
state_dir: "./batch-state" # Directory for state files (defaults to cache_dir)
poll_interval_s: 30 # How often to poll for completion
max_poll_hours: 24 # Maximum time to wait
Batch Management Functions
These functions are available for programmatic batch management:
from minima_llm import (
batch_status_overview,
cancel_batch,
cancel_all_batches,
cancel_all_local_batches,
MinimaLlmConfig,
)
config = MinimaLlmConfig.from_yaml("config.yml")
# Show status of all local batch state files
batch_status_overview(config)
# Cancel a specific batch by remote batch ID
cancel_batch("batch_abc123", config)
# Cancel all batches matching a prefix
cancel_all_batches(config, prefix="my-project")
# Cancel ALL local batches
cancel_all_local_batches(config)
Command Line Interface
minima-llm provides a standalone CLI for batch management:
# Show status of all batches (uses CACHE_DIR from environment)
minima-llm batch-status
# With explicit config file
minima-llm batch-status --config config.yml
# Cancel batches matching a prefix
minima-llm batch-status --cancel my-prefix
# Cancel a specific remote batch by ID
minima-llm batch-status --cancel-remote batch_abc123
# Cancel ALL local batches
minima-llm batch-status --cancel-all
When calling from a different directory, use absolute paths or set environment variables:
# Absolute path to config
minima-llm batch-status --config /path/to/project/config.yml
# Or set CACHE_DIR to find batch state files
CACHE_DIR=/path/to/project/cache minima-llm batch-status
Configuration
Environment Variables
| Variable | Description | Default |
|---|---|---|
OPENAI_BASE_URL |
API endpoint URL | (required) |
OPENAI_MODEL |
Model identifier | (required) |
OPENAI_API_KEY |
API key | None |
CACHE_DIR |
SQLite cache directory | None (disabled) |
BATCH_NUM_WORKERS |
Concurrent workers | 64 |
MAX_OUTSTANDING |
Max in-flight HTTP requests | 32 |
RPM |
Requests per minute (0=unlimited) | 600 |
TIMEOUT_S |
Per-request timeout | 60.0 |
MAX_ATTEMPTS |
Max retry attempts (0=infinite) | 6 |
CACHE_FORCE_REFRESH |
Skip cache reads, still write | 0 (disabled) |
MINIMA_TRACE_FILE |
Cache key debug log (JSONL) | None (disabled) |
YAML Configuration
base_url: "https://api.openai.com/v1"
model: "gpt-4"
api_key: "sk-..."
cache_dir: "./cache"
# Optional batch settings
batch:
num_workers: 64
max_failures: 25
heartbeat_s: 10.0
Load with:
config = MinimaLlmConfig.from_yaml("config.yml")
Prompt Caching
minima-llm includes an SQLite-backed prompt cache that stores LLM responses keyed by a SHA-256 hash of the request parameters (model, messages, temperature, max_tokens, extras). The database uses WAL mode for multi-process safety.
Enable / Disable
- Enable: Set
cache_dirto a directory path via environment variable, YAML, or code. The cache database is created at{cache_dir}/minima_llm.db. - Disable: Leave
cache_dirunset (default). No cache files are created.
cache_dir: "./my-cache"
Force Refresh
Force refresh bypasses cache reads but still writes new responses to the cache, useful for regenerating stale entries.
- Config-wide: Set
CACHE_FORCE_REFRESH=1env var, orforce_refresh: truein YAML. - Per-request: Pass
force_refresh=Truetogenerate():
result = await backend.generate(request, force_refresh=True)
Debug Tracing
To diagnose cache misses, set MINIMA_TRACE_FILE to a file path. Every cache key computation is logged as a JSONL line containing the canonical JSON used for hashing and the resulting SHA-256 key:
MINIMA_TRACE_FILE=trace.jsonl python my_script.py
Each line has the form {"key": "<sha256>", "canonical": "<json>"}. Compare canonical JSON between runs to spot differences causing cache misses.
Architecture
minima_llm/
├── protocol.py # AsyncMinimaLlmBackend protocol, Request/Response types
├── config.py # MinimaLlmConfig, BatchConfig, ParasailBatchConfig
├── backend.py # OpenAIMinimaLlm - full async backend with cache
├── batch.py # run_batched_callable, Parasail batch support, batch management
├── cli.py # Command-line interface (minima-llm command)
└── dspy_adapter.py # MinimaLlmDSPyLM, TolerantChatAdapter (optional)
Multi-Loop Support
The backend is designed to be reused across multiple asyncio.run() calls:
backend = OpenAIMinimaLlm(config)
# First asyncio.run()
asyncio.run(batch1(backend))
# Second asyncio.run() - works correctly
asyncio.run(batch2(backend))
This is achieved through lazy per-loop initialization of async primitives.
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file minima_llm-0.2.2.tar.gz.
File metadata
- Download URL: minima_llm-0.2.2.tar.gz
- Upload date:
- Size: 52.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e6f861652b34f15a3db61674f9eb7870f645f7ac181331932a88515c3d8fbb3a
|
|
| MD5 |
24af528fd226cfe12fa1d06451e4964b
|
|
| BLAKE2b-256 |
429868f8acf98732b3c5cb708d033453c1bf36eb7b460dadf22c3e02e2919f06
|
Provenance
The following attestation bundles were made for minima_llm-0.2.2.tar.gz:
Publisher:
publish.yml on trec-auto-judge/minima-llm
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
minima_llm-0.2.2.tar.gz -
Subject digest:
e6f861652b34f15a3db61674f9eb7870f645f7ac181331932a88515c3d8fbb3a - Sigstore transparency entry: 957435421
- Sigstore integration time:
-
Permalink:
trec-auto-judge/minima-llm@80b38b8c7a8727b33ea446f2760fcd7ff06fb4b9 -
Branch / Tag:
refs/tags/v0.2.2 - Owner: https://github.com/trec-auto-judge
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@80b38b8c7a8727b33ea446f2760fcd7ff06fb4b9 -
Trigger Event:
push
-
Statement type:
File details
Details for the file minima_llm-0.2.2-py3-none-any.whl.
File metadata
- Download URL: minima_llm-0.2.2-py3-none-any.whl
- Upload date:
- Size: 44.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d9d754263c35dea496af997087ee5eb22584ab7a06b2457d3606833e48747704
|
|
| MD5 |
e1d5fe017ffa1bcc939113e52f4c4798
|
|
| BLAKE2b-256 |
ef7a188574369757e2976a9d3ff1c2cb472e2149003c92e7766605f6faf4f27d
|
Provenance
The following attestation bundles were made for minima_llm-0.2.2-py3-none-any.whl:
Publisher:
publish.yml on trec-auto-judge/minima-llm
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
minima_llm-0.2.2-py3-none-any.whl -
Subject digest:
d9d754263c35dea496af997087ee5eb22584ab7a06b2457d3606833e48747704 - Sigstore transparency entry: 957435427
- Sigstore integration time:
-
Permalink:
trec-auto-judge/minima-llm@80b38b8c7a8727b33ea446f2760fcd7ff06fb4b9 -
Branch / Tag:
refs/tags/v0.2.2 - Owner: https://github.com/trec-auto-judge
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@80b38b8c7a8727b33ea446f2760fcd7ff06fb4b9 -
Trigger Event:
push
-
Statement type: