Minimal async LLM backend with caching and batch execution

These details have not been verified by PyPI

Project links

Project description

minima-llm

Minimal async LLM backend with caching and batch execution.

Features

Zero Dependencies: Core package uses only Python stdlib (asyncio, urllib, sqlite3)
SQLite Cache: Automatic prompt caching with WAL mode for multi-process safety
Batch Execution: Worker pool pattern with heartbeat, failure tracking, and early abort
Rate Limiting: RPM pacing with server-learned limits from rate limit headers
Retry Logic: Exponential backoff with jitter, cooldown after overload
OpenAI Compatible: Works with any OpenAI-compatible endpoint
DSPy Integration: Optional adapter for DSPy framework (requires [dspy] extra)

Installation

# Core only (no dependencies)
pip install minima-llm

# With DSPy support
pip install minima-llm[dspy]

# With YAML config support
pip install minima-llm[yaml]

# Development
pip install minima-llm[dev]

Quick Start

Basic Usage

import asyncio
from minima_llm import MinimaLlmConfig, OpenAIMinimaLlm, MinimaLlmRequest

async def main():
    # Configure from environment or explicit values
    config = MinimaLlmConfig(
        base_url="https://api.openai.com/v1",
        model="gpt-4",
        api_key="sk-...",
        cache_dir="./cache",
    )

    backend = OpenAIMinimaLlm(config)

    # Single request
    request = MinimaLlmRequest(
        request_id="q1",
        messages=[{"role": "user", "content": "What is 2+2?"}],
        temperature=0.0,
    )

    result = await backend.generate(request)
    print(result.text)

    await backend.aclose()

asyncio.run(main())

Batch Execution

import asyncio
from minima_llm import MinimaLlmConfig, OpenAIMinimaLlm, MinimaLlmRequest

async def main():
    config = MinimaLlmConfig.from_env()
    backend = OpenAIMinimaLlm(config)

    requests = [
        MinimaLlmRequest(
            request_id=f"q{i}",
            messages=[{"role": "user", "content": f"Question {i}"}],
        )
        for i in range(100)
    ]

    # Run batch with progress heartbeat
    results = await backend.run_batched(requests)

    for r in results:
        if hasattr(r, 'text'):
            print(f"{r.request_id}: {r.text[:50]}...")

    await backend.aclose()

asyncio.run(main())

With DSPy

import asyncio
import dspy
from minima_llm import MinimaLlmConfig, OpenAIMinimaLlm
from minima_llm.dspy_adapter import MinimaLlmDSPyLM

class QA(dspy.Signature):
    question = dspy.InputField()
    answer = dspy.OutputField()

async def main():
    config = MinimaLlmConfig.from_env()
    backend = OpenAIMinimaLlm(config)
    lm = MinimaLlmDSPyLM(backend)

    dspy.configure(lm=lm)

    predictor = dspy.ChainOfThought(QA)
    result = await predictor.acall(question="What is the capital of France?")
    print(result.answer)

    await backend.aclose()

asyncio.run(main())

Configuration

Environment Variables

Variable	Description	Default
`OPENAI_BASE_URL`	API endpoint URL	(required)
`OPENAI_MODEL`	Model identifier	(required)
`OPENAI_API_KEY`	API key	None
`CACHE_DIR`	SQLite cache directory	None (disabled)
`BATCH_NUM_WORKERS`	Concurrent workers	64
`MAX_OUTSTANDING`	Max in-flight HTTP requests	32
`RPM`	Requests per minute (0=unlimited)	600
`TIMEOUT_S`	Per-request timeout	60.0
`MAX_ATTEMPTS`	Max retry attempts (0=infinite)	6

YAML Configuration

base_url: "https://api.openai.com/v1"
model: "gpt-4"
api_key: "sk-..."
cache_dir: "./cache"

# Optional batch settings
batch:
  num_workers: 64
  max_failures: 25
  heartbeat_s: 10.0

Load with:

config = MinimaLlmConfig.from_yaml("config.yml")

Architecture

minima_llm/
├── protocol.py      # AsyncMinimaLlmBackend protocol, Request/Response types
├── config.py        # MinimaLlmConfig, BatchConfig, ParasailBatchConfig
├── backend.py       # OpenAIMinimaLlm - full async backend with cache
├── batch.py         # run_batched_callable, Parasail batch support
└── dspy_adapter.py  # MinimaLlmDSPyLM, TolerantChatAdapter (optional)

Multi-Loop Support

The backend is designed to be reused across multiple asyncio.run() calls:

backend = OpenAIMinimaLlm(config)

# First asyncio.run()
asyncio.run(batch1(backend))

# Second asyncio.run() - works correctly
asyncio.run(batch2(backend))

This is achieved through lazy per-loop initialization of async primitives.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.4

Apr 6, 2026

0.2.3

Mar 8, 2026

0.2.2

Feb 17, 2026

0.2.1

Feb 7, 2026

0.1.1

Feb 7, 2026

This version

0.1.0

Feb 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

minima_llm-0.1.0.tar.gz (66.0 kB view details)

Uploaded Feb 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

minima_llm-0.1.0-py3-none-any.whl (39.7 kB view details)

Uploaded Feb 5, 2026 Python 3

File details

Details for the file minima_llm-0.1.0.tar.gz.

File metadata

Download URL: minima_llm-0.1.0.tar.gz
Upload date: Feb 5, 2026
Size: 66.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.11

File hashes

Hashes for minima_llm-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`530913ba82bafee083601c2f17570f6391a5388adc882f6ad452a6b52ee8309a`
MD5	`fe2f81b54fcaa88973d879773a6eb7c2`
BLAKE2b-256	`91dc277dab3873800541fd4a022fb791c649d43e1ec1d526b3923a25efb06cd7`

See more details on using hashes here.

File details

Details for the file minima_llm-0.1.0-py3-none-any.whl.

File metadata

Download URL: minima_llm-0.1.0-py3-none-any.whl
Upload date: Feb 5, 2026
Size: 39.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.11

File hashes

Hashes for minima_llm-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d1c7dec1c2914e20d6e4ff1ebca264195e3de642673b2bc733f0748e7ee5d3f9`
MD5	`17edb016fc51ca034c57ef083d1d6d82`
BLAKE2b-256	`3da519c1bc5dca2f2ca91cb777e24afaf85936fb3f1d3da85ed3c49d676e9451`

See more details on using hashes here.

minima-llm 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

minima-llm

Features

Installation

Quick Start

Basic Usage

Batch Execution

With DSPy

Configuration

Environment Variables

YAML Configuration

Architecture

Multi-Loop Support

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes