Minimal async LLM backend with caching and batch execution

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

laura-dietz

These details have not been verified by PyPI

Project description

minima-llm

Minimal async LLM backend with caching and batch execution.

Features

Zero Dependencies: Core package uses only Python stdlib (asyncio, urllib, sqlite3)
SQLite Cache: Automatic prompt caching with WAL mode for multi-process safety
Batch Execution: Worker pool pattern with heartbeat, failure tracking, and early abort
Rate Limiting: RPM pacing with server-learned limits from rate limit headers
Retry Logic: Exponential backoff with jitter, cooldown after overload
OpenAI Compatible: Works with any OpenAI-compatible endpoint
DSPy Integration: Optional adapter for DSPy framework (requires [dspy] extra)
Proxy Mode: OpenAI-compatible HTTP proxy server so any application (DSPy, LangChain, curl) gets caching and rate limiting

Installation

# Core only (no dependencies)
pip install minima-llm

# With DSPy support
pip install minima-llm[dspy]

# With YAML config support
pip install minima-llm[yaml]

# Development
pip install minima-llm[dev]

Quick Start

Basic Usage

import asyncio
from minima_llm import MinimaLlmConfig, OpenAIMinimaLlm, MinimaLlmRequest

async def main():
    # Configure from environment or explicit values
    config = MinimaLlmConfig(
        base_url="https://api.openai.com/v1",
        model="gpt-4",
        api_key="sk-...",
        cache_dir="./cache",
    )

    backend = OpenAIMinimaLlm(config)

    # Single request
    request = MinimaLlmRequest(
        request_id="q1",
        messages=[{"role": "user", "content": "What is 2+2?"}],
        temperature=0.0,
    )

    result = await backend.generate(request)
    print(result.text)

    await backend.aclose()

asyncio.run(main())

Batch Execution

import asyncio
from minima_llm import MinimaLlmConfig, OpenAIMinimaLlm, MinimaLlmRequest

async def main():
    config = MinimaLlmConfig.from_env()
    backend = OpenAIMinimaLlm(config)

    requests = [
        MinimaLlmRequest(
            request_id=f"q{i}",
            messages=[{"role": "user", "content": f"Question {i}"}],
        )
        for i in range(100)
    ]

    # Run batch with progress heartbeat
    results = await backend.run_batched(requests)

    for r in results:
        if hasattr(r, 'text'):
            print(f"{r.request_id}: {r.text[:50]}...")

    await backend.aclose()

asyncio.run(main())

With DSPy

import asyncio
import dspy
from minima_llm import MinimaLlmConfig, OpenAIMinimaLlm
from minima_llm.dspy_adapter import MinimaLlmDSPyLM

class QA(dspy.Signature):
    question = dspy.InputField()
    answer = dspy.OutputField()

async def main():
    config = MinimaLlmConfig.from_env()
    backend = OpenAIMinimaLlm(config)
    lm = MinimaLlmDSPyLM(backend)

    dspy.configure(lm=lm)

    predictor = dspy.ChainOfThought(QA)
    result = await predictor.acall(question="What is the capital of France?")
    print(result.answer)

    await backend.aclose()

asyncio.run(main())

Proxy Mode

minimallm-proxy starts a localhost HTTP server with an OpenAI-compatible API. Any application that speaks the OpenAI protocol can point to it and automatically benefit from minima-llm's prompt caching, rate limiting, backpressure, and retry logic.

Start the proxy

# Using environment variables (OPENAI_BASE_URL, OPENAI_MODEL, CACHE_DIR, etc.)
minimallm-proxy --port 8990

# With a YAML config file
minimallm-proxy --port 8990 --config config.yml

# Force all requests to use the configured OPENAI_MODEL (ignore client's model field)
minimallm-proxy --port 8990 --force-model

Send requests

Point any OpenAI-compatible client to http://localhost:8990/v1:

curl -X POST http://localhost:8990/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4","messages":[{"role":"user","content":"Hello"}]}'

# With litellm / DSPy
import os
os.environ["OPENAI_API_BASE"] = "http://localhost:8990/v1"

Options

Flag	Default	Description
`--host`	`127.0.0.1`	Bind address
`--port`	`8990`	Listen port
`--config` / `-c`	env vars	YAML config file
`--force-model`	off	Ignore client model, use `OPENAI_MODEL`

Supported endpoints

Endpoint	Method	Description
`/v1/chat/completions`	POST	Chat completions (non-streaming only)
`/v1/models`	GET	List the configured model

Streaming ("stream": true) is not supported and returns HTTP 400.

Batch Management

For long-running batch jobs using the OpenAI batch API, minima-llm provides batch state management with local state files for resumption after interruption.

Configuration

Enable Parasail batch mode in your config:

parasail:
  llm_batch_prefix: "my-project"  # Prefix for batch state files
  state_dir: "./batch-state"      # Directory for state files (defaults to cache_dir)
  poll_interval_s: 30             # How often to poll for completion
  max_poll_hours: 24              # Maximum time to wait

Batch Management Functions

These functions are available for programmatic batch management:

from minima_llm import (
    batch_status_overview,
    cancel_batch,
    cancel_all_batches,
    cancel_all_local_batches,
    MinimaLlmConfig,
)

config = MinimaLlmConfig.from_yaml("config.yml")

# Show status of all local batch state files
batch_status_overview(config)

# Cancel a specific batch by remote batch ID
cancel_batch("batch_abc123", config)

# Cancel all batches matching a prefix
cancel_all_batches(config, prefix="my-project")

# Cancel ALL local batches
cancel_all_local_batches(config)

Command Line Interface

minima-llm provides a standalone CLI for batch management:

# Show status of all batches (uses CACHE_DIR from environment)
minima-llm batch-status

# With explicit config file
minima-llm batch-status --config config.yml

# Cancel batches matching a prefix
minima-llm batch-status --cancel my-prefix

# Cancel a specific remote batch by ID
minima-llm batch-status --cancel-remote batch_abc123

# Cancel ALL local batches
minima-llm batch-status --cancel-all

When calling from a different directory, use absolute paths or set environment variables:

# Absolute path to config
minima-llm batch-status --config /path/to/project/config.yml

# Or set CACHE_DIR to find batch state files
CACHE_DIR=/path/to/project/cache minima-llm batch-status

Configuration

Environment Variables

Variable	Description	Default
`OPENAI_BASE_URL`	API endpoint URL	(required)
`OPENAI_MODEL`	Model identifier	(required)
`OPENAI_API_KEY`	API key	None
`CACHE_DIR`	SQLite cache directory	None (disabled)
`BATCH_NUM_WORKERS`	Concurrent workers	64
`MAX_OUTSTANDING`	Max in-flight HTTP requests	32
`RPM`	Requests per minute (0=unlimited)	600
`TIMEOUT_S`	Per-request timeout	60.0
`MAX_ATTEMPTS`	Max retry attempts (0=infinite)	6
`CACHE_FORCE_REFRESH`	Skip cache reads, still write	0 (disabled)
`MINIMA_TRACE_FILE`	Cache key debug log (JSONL)	None (disabled)

YAML Configuration

base_url: "https://api.openai.com/v1"
model: "gpt-4"
api_key: "sk-..."
cache_dir: "./cache"

# Optional batch settings
batch:
  num_workers: 64
  max_failures: 25
  heartbeat_s: 10.0

Load with:

config = MinimaLlmConfig.from_yaml("config.yml")

Prompt Caching

minima-llm includes an SQLite-backed prompt cache that stores LLM responses keyed by a SHA-256 hash of the request parameters (model, messages, temperature, max_tokens, extras). The database uses WAL mode for multi-process safety.

Enable / Disable

Enable: Set cache_dir to a directory path via environment variable, YAML, or code. The cache database is created at {cache_dir}/minima_llm.db.
Disable: Leave cache_dir unset (default). No cache files are created.

cache_dir: "./my-cache"

Force Refresh

Force refresh bypasses cache reads but still writes new responses to the cache, useful for regenerating stale entries.

Config-wide: Set CACHE_FORCE_REFRESH=1 env var, or force_refresh: true in YAML.
Per-request: Pass force_refresh=True to generate():

result = await backend.generate(request, force_refresh=True)

Debug Tracing

To diagnose cache misses, set MINIMA_TRACE_FILE to a file path. Every cache key computation is logged as a JSONL line containing the canonical JSON used for hashing and the resulting SHA-256 key:

MINIMA_TRACE_FILE=trace.jsonl python my_script.py

Each line has the form {"key": "<sha256>", "canonical": "<json>"}. Compare canonical JSON between runs to spot differences causing cache misses.

Architecture

minima_llm/
├── protocol.py      # AsyncMinimaLlmBackend protocol, Request/Response types
├── config.py        # MinimaLlmConfig, BatchConfig, ParasailBatchConfig
├── backend.py       # OpenAIMinimaLlm - full async backend with cache
├── batch.py         # run_batched_callable, Parasail batch support, batch management
├── proxy.py         # OpenAI-compatible HTTP proxy server (minimallm-proxy)
├── cli.py           # Command-line interface (minima-llm, minimallm-proxy)
└── dspy_adapter.py  # MinimaLlmDSPyLM, TolerantChatAdapter (optional)

Multi-Loop Support

The backend is designed to be reused across multiple asyncio.run() calls:

backend = OpenAIMinimaLlm(config)

# First asyncio.run()
asyncio.run(batch1(backend))

# Second asyncio.run() - works correctly
asyncio.run(batch2(backend))

This is achieved through lazy per-loop initialization of async primitives.

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

laura-dietz

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.4

Apr 6, 2026

0.2.3

Mar 8, 2026

0.2.2

Feb 17, 2026

0.2.1

Feb 7, 2026

0.1.1

Feb 7, 2026

0.1.0

Feb 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

minima_llm-0.2.4.tar.gz (61.2 kB view details)

Uploaded Apr 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

minima_llm-0.2.4-py3-none-any.whl (48.7 kB view details)

Uploaded Apr 6, 2026 Python 3

File details

Details for the file minima_llm-0.2.4.tar.gz.

File metadata

Download URL: minima_llm-0.2.4.tar.gz
Upload date: Apr 6, 2026
Size: 61.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for minima_llm-0.2.4.tar.gz
Algorithm	Hash digest
SHA256	`096c48c104c8806617491d996af5625ef9b6cd7768364018875ad4aafb84f7d9`
MD5	`1387d4c81da640c24425dbb054e67c57`
BLAKE2b-256	`6fdbfdbe938e5ce9697ac0041ec9531a1b04ff30dbef365df30e87727eba0ad4`

See more details on using hashes here.

Provenance

The following attestation bundles were made for minima_llm-0.2.4.tar.gz:

Publisher: publish.yml on trec-auto-judge/minima-llm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: minima_llm-0.2.4.tar.gz
- Subject digest: 096c48c104c8806617491d996af5625ef9b6cd7768364018875ad4aafb84f7d9
- Sigstore transparency entry: 1243601072
- Sigstore integration time: Apr 6, 2026
Source repository:
- Permalink: trec-auto-judge/minima-llm@e4ce8855e8a6b3f36fe6b5f3756a53a445605a67
- Branch / Tag: refs/tags/v0.2.4
- Owner: https://github.com/trec-auto-judge
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@e4ce8855e8a6b3f36fe6b5f3756a53a445605a67
- Trigger Event: push

File details

Details for the file minima_llm-0.2.4-py3-none-any.whl.

File metadata

Download URL: minima_llm-0.2.4-py3-none-any.whl
Upload date: Apr 6, 2026
Size: 48.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for minima_llm-0.2.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`428deb6acac9d1b09da5982273d9e952d236be195498bc9a29889c46e2441657`
MD5	`1409619872d7505d19e72b636d97870e`
BLAKE2b-256	`bfbcc419754ad696d7e2500532b29650838826d7a3b6d867407c23eb97316e69`

See more details on using hashes here.

Provenance

The following attestation bundles were made for minima_llm-0.2.4-py3-none-any.whl:

Publisher: publish.yml on trec-auto-judge/minima-llm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: minima_llm-0.2.4-py3-none-any.whl
- Subject digest: 428deb6acac9d1b09da5982273d9e952d236be195498bc9a29889c46e2441657
- Sigstore transparency entry: 1243601081
- Sigstore integration time: Apr 6, 2026
Source repository:
- Permalink: trec-auto-judge/minima-llm@e4ce8855e8a6b3f36fe6b5f3756a53a445605a67
- Branch / Tag: refs/tags/v0.2.4
- Owner: https://github.com/trec-auto-judge
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@e4ce8855e8a6b3f36fe6b5f3756a53a445605a67
- Trigger Event: push

minima-llm 0.2.4

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

minima-llm

Features

Installation

Quick Start

Basic Usage

Batch Execution

With DSPy

Proxy Mode

Start the proxy

Send requests

Options

Supported endpoints

Batch Management

Configuration

Batch Management Functions

Command Line Interface

Configuration

Environment Variables

YAML Configuration

Prompt Caching

Enable / Disable

Force Refresh

Debug Tracing

Architecture

Multi-Loop Support

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance