Skip to main content

Minimal async LLM backend with caching and batch execution

Project description

minima-llm

Minimal async LLM backend with caching and batch execution.

Features

  • Zero Dependencies: Core package uses only Python stdlib (asyncio, urllib, sqlite3)
  • SQLite Cache: Automatic prompt caching with WAL mode for multi-process safety
  • Batch Execution: Worker pool pattern with heartbeat, failure tracking, and early abort
  • Rate Limiting: RPM pacing with server-learned limits from rate limit headers
  • Retry Logic: Exponential backoff with jitter, cooldown after overload
  • OpenAI Compatible: Works with any OpenAI-compatible endpoint
  • DSPy Integration: Optional adapter for DSPy framework (requires [dspy] extra)

Installation

# Core only (no dependencies)
pip install minima-llm

# With DSPy support
pip install minima-llm[dspy]

# With YAML config support
pip install minima-llm[yaml]

# Development
pip install minima-llm[dev]

Quick Start

Basic Usage

import asyncio
from minima_llm import MinimaLlmConfig, OpenAIMinimaLlm, MinimaLlmRequest

async def main():
    # Configure from environment or explicit values
    config = MinimaLlmConfig(
        base_url="https://api.openai.com/v1",
        model="gpt-4",
        api_key="sk-...",
        cache_dir="./cache",
    )

    backend = OpenAIMinimaLlm(config)

    # Single request
    request = MinimaLlmRequest(
        request_id="q1",
        messages=[{"role": "user", "content": "What is 2+2?"}],
        temperature=0.0,
    )

    result = await backend.generate(request)
    print(result.text)

    await backend.aclose()

asyncio.run(main())

Batch Execution

import asyncio
from minima_llm import MinimaLlmConfig, OpenAIMinimaLlm, MinimaLlmRequest

async def main():
    config = MinimaLlmConfig.from_env()
    backend = OpenAIMinimaLlm(config)

    requests = [
        MinimaLlmRequest(
            request_id=f"q{i}",
            messages=[{"role": "user", "content": f"Question {i}"}],
        )
        for i in range(100)
    ]

    # Run batch with progress heartbeat
    results = await backend.run_batched(requests)

    for r in results:
        if hasattr(r, 'text'):
            print(f"{r.request_id}: {r.text[:50]}...")

    await backend.aclose()

asyncio.run(main())

With DSPy

import asyncio
import dspy
from minima_llm import MinimaLlmConfig, OpenAIMinimaLlm
from minima_llm.dspy_adapter import MinimaLlmDSPyLM

class QA(dspy.Signature):
    question = dspy.InputField()
    answer = dspy.OutputField()

async def main():
    config = MinimaLlmConfig.from_env()
    backend = OpenAIMinimaLlm(config)
    lm = MinimaLlmDSPyLM(backend)

    dspy.configure(lm=lm)

    predictor = dspy.ChainOfThought(QA)
    result = await predictor.acall(question="What is the capital of France?")
    print(result.answer)

    await backend.aclose()

asyncio.run(main())

Configuration

Environment Variables

Variable Description Default
OPENAI_BASE_URL API endpoint URL (required)
OPENAI_MODEL Model identifier (required)
OPENAI_API_KEY API key None
CACHE_DIR SQLite cache directory None (disabled)
BATCH_NUM_WORKERS Concurrent workers 64
MAX_OUTSTANDING Max in-flight HTTP requests 32
RPM Requests per minute (0=unlimited) 600
TIMEOUT_S Per-request timeout 60.0
MAX_ATTEMPTS Max retry attempts (0=infinite) 6

YAML Configuration

base_url: "https://api.openai.com/v1"
model: "gpt-4"
api_key: "sk-..."
cache_dir: "./cache"

# Optional batch settings
batch:
  num_workers: 64
  max_failures: 25
  heartbeat_s: 10.0

Load with:

config = MinimaLlmConfig.from_yaml("config.yml")

Architecture

minima_llm/
├── protocol.py      # AsyncMinimaLlmBackend protocol, Request/Response types
├── config.py        # MinimaLlmConfig, BatchConfig, ParasailBatchConfig
├── backend.py       # OpenAIMinimaLlm - full async backend with cache
├── batch.py         # run_batched_callable, Parasail batch support
└── dspy_adapter.py  # MinimaLlmDSPyLM, TolerantChatAdapter (optional)

Multi-Loop Support

The backend is designed to be reused across multiple asyncio.run() calls:

backend = OpenAIMinimaLlm(config)

# First asyncio.run()
asyncio.run(batch1(backend))

# Second asyncio.run() - works correctly
asyncio.run(batch2(backend))

This is achieved through lazy per-loop initialization of async primitives.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

minima_llm-0.1.0.tar.gz (66.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

minima_llm-0.1.0-py3-none-any.whl (39.7 kB view details)

Uploaded Python 3

File details

Details for the file minima_llm-0.1.0.tar.gz.

File metadata

  • Download URL: minima_llm-0.1.0.tar.gz
  • Upload date:
  • Size: 66.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.11

File hashes

Hashes for minima_llm-0.1.0.tar.gz
Algorithm Hash digest
SHA256 530913ba82bafee083601c2f17570f6391a5388adc882f6ad452a6b52ee8309a
MD5 fe2f81b54fcaa88973d879773a6eb7c2
BLAKE2b-256 91dc277dab3873800541fd4a022fb791c649d43e1ec1d526b3923a25efb06cd7

See more details on using hashes here.

File details

Details for the file minima_llm-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: minima_llm-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 39.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.11

File hashes

Hashes for minima_llm-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d1c7dec1c2914e20d6e4ff1ebca264195e3de642673b2bc733f0748e7ee5d3f9
MD5 17edb016fc51ca034c57ef083d1d6d82
BLAKE2b-256 3da519c1bc5dca2f2ca91cb777e24afaf85936fb3f1d3da85ed3c49d676e9451

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page