Skip to main content

Minimal async LLM backend with caching and batch execution

Project description

minima-llm

Minimal async LLM backend with caching and batch execution.

Features

  • Zero Dependencies: Core package uses only Python stdlib (asyncio, urllib, sqlite3)
  • SQLite Cache: Automatic prompt caching with WAL mode for multi-process safety
  • Batch Execution: Worker pool pattern with heartbeat, failure tracking, and early abort
  • Rate Limiting: RPM pacing with server-learned limits from rate limit headers
  • Retry Logic: Exponential backoff with jitter, cooldown after overload
  • OpenAI Compatible: Works with any OpenAI-compatible endpoint
  • DSPy Integration: Optional adapter for DSPy framework (requires [dspy] extra)

Installation

# Core only (no dependencies)
pip install minima-llm

# With DSPy support
pip install minima-llm[dspy]

# With YAML config support
pip install minima-llm[yaml]

# Development
pip install minima-llm[dev]

Quick Start

Basic Usage

import asyncio
from minima_llm import MinimaLlmConfig, OpenAIMinimaLlm, MinimaLlmRequest

async def main():
    # Configure from environment or explicit values
    config = MinimaLlmConfig(
        base_url="https://api.openai.com/v1",
        model="gpt-4",
        api_key="sk-...",
        cache_dir="./cache",
    )

    backend = OpenAIMinimaLlm(config)

    # Single request
    request = MinimaLlmRequest(
        request_id="q1",
        messages=[{"role": "user", "content": "What is 2+2?"}],
        temperature=0.0,
    )

    result = await backend.generate(request)
    print(result.text)

    await backend.aclose()

asyncio.run(main())

Batch Execution

import asyncio
from minima_llm import MinimaLlmConfig, OpenAIMinimaLlm, MinimaLlmRequest

async def main():
    config = MinimaLlmConfig.from_env()
    backend = OpenAIMinimaLlm(config)

    requests = [
        MinimaLlmRequest(
            request_id=f"q{i}",
            messages=[{"role": "user", "content": f"Question {i}"}],
        )
        for i in range(100)
    ]

    # Run batch with progress heartbeat
    results = await backend.run_batched(requests)

    for r in results:
        if hasattr(r, 'text'):
            print(f"{r.request_id}: {r.text[:50]}...")

    await backend.aclose()

asyncio.run(main())

With DSPy

import asyncio
import dspy
from minima_llm import MinimaLlmConfig, OpenAIMinimaLlm
from minima_llm.dspy_adapter import MinimaLlmDSPyLM

class QA(dspy.Signature):
    question = dspy.InputField()
    answer = dspy.OutputField()

async def main():
    config = MinimaLlmConfig.from_env()
    backend = OpenAIMinimaLlm(config)
    lm = MinimaLlmDSPyLM(backend)

    dspy.configure(lm=lm)

    predictor = dspy.ChainOfThought(QA)
    result = await predictor.acall(question="What is the capital of France?")
    print(result.answer)

    await backend.aclose()

asyncio.run(main())

Batch Management

For long-running batch jobs using the OpenAI batch API, minima-llm provides batch state management with local state files for resumption after interruption.

Configuration

Enable Parasail batch mode in your config:

parasail:
  llm_batch_prefix: "my-project"  # Prefix for batch state files
  state_dir: "./batch-state"      # Directory for state files (defaults to cache_dir)
  poll_interval_s: 30             # How often to poll for completion
  max_poll_hours: 24              # Maximum time to wait

Batch Management Functions

These functions are available for programmatic batch management:

from minima_llm import (
    batch_status_overview,
    cancel_batch,
    cancel_all_batches,
    cancel_all_local_batches,
    MinimaLlmConfig,
)

config = MinimaLlmConfig.from_yaml("config.yml")

# Show status of all local batch state files
batch_status_overview(config)

# Cancel a specific batch by remote batch ID
cancel_batch("batch_abc123", config)

# Cancel all batches matching a prefix
cancel_all_batches(config, prefix="my-project")

# Cancel ALL local batches
cancel_all_local_batches(config)

Command Line Interface

minima-llm provides a standalone CLI for batch management:

# Show status of all batches (uses CACHE_DIR from environment)
minima-llm batch-status

# With explicit config file
minima-llm batch-status --config config.yml

# Cancel batches matching a prefix
minima-llm batch-status --cancel my-prefix

# Cancel a specific remote batch by ID
minima-llm batch-status --cancel-remote batch_abc123

# Cancel ALL local batches
minima-llm batch-status --cancel-all

When calling from a different directory, use absolute paths or set environment variables:

# Absolute path to config
minima-llm batch-status --config /path/to/project/config.yml

# Or set CACHE_DIR to find batch state files
CACHE_DIR=/path/to/project/cache minima-llm batch-status

Configuration

Environment Variables

Variable Description Default
OPENAI_BASE_URL API endpoint URL (required)
OPENAI_MODEL Model identifier (required)
OPENAI_API_KEY API key None
CACHE_DIR SQLite cache directory None (disabled)
BATCH_NUM_WORKERS Concurrent workers 64
MAX_OUTSTANDING Max in-flight HTTP requests 32
RPM Requests per minute (0=unlimited) 600
TIMEOUT_S Per-request timeout 60.0
MAX_ATTEMPTS Max retry attempts (0=infinite) 6
CACHE_FORCE_REFRESH Skip cache reads, still write 0 (disabled)
MINIMA_TRACE_FILE Cache key debug log (JSONL) None (disabled)

YAML Configuration

base_url: "https://api.openai.com/v1"
model: "gpt-4"
api_key: "sk-..."
cache_dir: "./cache"

# Optional batch settings
batch:
  num_workers: 64
  max_failures: 25
  heartbeat_s: 10.0

Load with:

config = MinimaLlmConfig.from_yaml("config.yml")

Prompt Caching

minima-llm includes an SQLite-backed prompt cache that stores LLM responses keyed by a SHA-256 hash of the request parameters (model, messages, temperature, max_tokens, extras). The database uses WAL mode for multi-process safety.

Enable / Disable

  • Enable: Set cache_dir to a directory path via environment variable, YAML, or code. The cache database is created at {cache_dir}/minima_llm.db.
  • Disable: Leave cache_dir unset (default). No cache files are created.
cache_dir: "./my-cache"

Force Refresh

Force refresh bypasses cache reads but still writes new responses to the cache, useful for regenerating stale entries.

  • Config-wide: Set CACHE_FORCE_REFRESH=1 env var, or force_refresh: true in YAML.
  • Per-request: Pass force_refresh=True to generate():
result = await backend.generate(request, force_refresh=True)

Debug Tracing

To diagnose cache misses, set MINIMA_TRACE_FILE to a file path. Every cache key computation is logged as a JSONL line containing the canonical JSON used for hashing and the resulting SHA-256 key:

MINIMA_TRACE_FILE=trace.jsonl python my_script.py

Each line has the form {"key": "<sha256>", "canonical": "<json>"}. Compare canonical JSON between runs to spot differences causing cache misses.

Architecture

minima_llm/
├── protocol.py      # AsyncMinimaLlmBackend protocol, Request/Response types
├── config.py        # MinimaLlmConfig, BatchConfig, ParasailBatchConfig
├── backend.py       # OpenAIMinimaLlm - full async backend with cache
├── batch.py         # run_batched_callable, Parasail batch support, batch management
├── cli.py           # Command-line interface (minima-llm command)
└── dspy_adapter.py  # MinimaLlmDSPyLM, TolerantChatAdapter (optional)

Multi-Loop Support

The backend is designed to be reused across multiple asyncio.run() calls:

backend = OpenAIMinimaLlm(config)

# First asyncio.run()
asyncio.run(batch1(backend))

# Second asyncio.run() - works correctly
asyncio.run(batch2(backend))

This is achieved through lazy per-loop initialization of async primitives.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

minima_llm-0.2.2.tar.gz (52.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

minima_llm-0.2.2-py3-none-any.whl (44.3 kB view details)

Uploaded Python 3

File details

Details for the file minima_llm-0.2.2.tar.gz.

File metadata

  • Download URL: minima_llm-0.2.2.tar.gz
  • Upload date:
  • Size: 52.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for minima_llm-0.2.2.tar.gz
Algorithm Hash digest
SHA256 e6f861652b34f15a3db61674f9eb7870f645f7ac181331932a88515c3d8fbb3a
MD5 24af528fd226cfe12fa1d06451e4964b
BLAKE2b-256 429868f8acf98732b3c5cb708d033453c1bf36eb7b460dadf22c3e02e2919f06

See more details on using hashes here.

Provenance

The following attestation bundles were made for minima_llm-0.2.2.tar.gz:

Publisher: publish.yml on trec-auto-judge/minima-llm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file minima_llm-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: minima_llm-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 44.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for minima_llm-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d9d754263c35dea496af997087ee5eb22584ab7a06b2457d3606833e48747704
MD5 e1d5fe017ffa1bcc939113e52f4c4798
BLAKE2b-256 ef7a188574369757e2976a9d3ff1c2cb472e2149003c92e7766605f6faf4f27d

See more details on using hashes here.

Provenance

The following attestation bundles were made for minima_llm-0.2.2-py3-none-any.whl:

Publisher: publish.yml on trec-auto-judge/minima-llm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page