LLM execution engine — multi-provider streaming, MCP tools, and budget management
Project description
llming-models
LLM execution engine -- multi-provider streaming, MCP tools, and budget management.
llming-models provides the core runtime for building multi-provider LLM applications. It handles model metadata and capabilities, streaming chat sessions with tool support, MCP (Model Context Protocol) integration, per-user configuration, and monetary budget tracking -- all behind a unified API that works across OpenAI, Anthropic, Google, Azure, Mistral, and Together/DeepSeek.
Features
- Model Metadata -- Rich model descriptors with pricing, context windows, capability flags (vision, reasoning), and UI hints (speed/quality ratings)
- Configuration -- Global and per-user model selection with category mappings, provider cascade priority, and include/exclude filters
- Budget Management -- Track and enforce monetary limits per time period with reservation/rollback semantics and pluggable backends (memory, MongoDB)
- Multi-Provider Streaming -- Unified async/sync streaming across OpenAI, Anthropic, Google Gemini, Azure OpenAI, Azure Anthropic, Mistral, and Together AI
- MCP Tools -- First-class Model Context Protocol support with tool registries, toolbox adapters, and built-in MCP servers (math, image generation)
- Chat Sessions -- High-level
ChatSessionwith automatic tool dispatch, conversation history, image support, and reasoning effort control - Conversation Persistence -- IndexedDB-compatible conversation storage with metadata, avatars, and file references
Chat Playground
Interactive chat UI with model selection, streaming responses, TTS/STT (OpenAI + ElevenLabs), push-to-talk, word-level highlighting, and token cost tracking. Run it with:
cp .env.template .env # fill in API keys
python samples/chat_app.py
# Open http://localhost:8000
Quick Start
Installation
pip install llming-models
Or from source with Poetry:
git clone https://github.com/Alyxion/llming-models.git
cd llming-models
poetry install
Basic Usage
from llming_models import LLMInfo, ModelSize, ReasoningEffort
model = LLMInfo(
provider="anthropic",
name="claude_sonnet",
label="Claude Sonnet 4.6",
model="claude-sonnet-4-6",
description="Fast, capable model for most tasks",
input_token_price=3.0,
output_token_price=15.0,
size=ModelSize.MEDIUM,
max_input_tokens=200_000,
max_output_tokens=64_000,
supports_image_input=True,
reasoning=True,
default_reasoning_effort=ReasoningEffort.MEDIUM,
speed=8,
quality=8,
best_use="Code & analysis",
highlights=["Fast", "Code", "Vision"],
)
print(f"{model.label}: ${model.input_token_price}/1M in, ${model.output_token_price}/1M out")
Configuration
from llming_models import LLMGlobalConfig, LLMUserConfig, ModelCategories
global_config = LLMGlobalConfig(
default_models={
ModelCategories.SMALL: ["claude_haiku", "gpt-4o-mini"],
ModelCategories.LARGE: ["claude_sonnet", "gpt-4o"],
},
provider_cascade=["anthropic", "openai"],
)
user_config = LLMUserConfig(
global_config=global_config,
default_models={ModelCategories.LARGE: "claude_sonnet"},
)
print(user_config.get_default_model(ModelCategories.LARGE)) # "claude_sonnet"
Budget Management
import asyncio
from llming_models import MemoryBudgetLimit, LLMBudgetManager, LimitPeriod
async def main():
limits = [
MemoryBudgetLimit(name="daily", amount=10.0, period=LimitPeriod.DAILY),
MemoryBudgetLimit(name="monthly", amount=100.0, period=LimitPeriod.MONTHLY),
]
manager = LLMBudgetManager(limits)
available = await manager.available_budget_async()
print(f"Available: {available:.2f}")
await manager.reserve_budget_async(
input_tokens=1000,
max_output_tokens=2000,
input_token_price=3.0, # per 1M tokens
output_token_price=15.0, # per 1M tokens
)
asyncio.run(main())
Project Structure
llming-models/
├── llming_models/ # Models, config, sessions, budget, providers, tools
│ ├── budget/ # Cost tracking with time-period limits (memory + MongoDB)
│ ├── providers/ # OpenAI, Anthropic, Google, Azure, Mistral, Together
│ ├── tools/ # Tool system, MCP integration, math server, image gen
│ └── utils/ # Image encoding utilities
├── tests/ # 1255 tests (unit + integration with live APIs)
├── samples/ # Example scripts
└── docs/ # Logo and assets
Development
poetry install
poetry run pytest # 1255 tests
poetry run ruff check # lint
poetry run mypy llming_models # type check
License
This project is licensed under the MIT License. Copyright (c) 2026 Michael Ikemann.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llming_models-0.1.1.tar.gz.
File metadata
- Download URL: llming_models-0.1.1.tar.gz
- Upload date:
- Size: 96.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.1 CPython/3.13.11 Darwin/25.2.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
37b75bfbe096a860672b959002481f1c7ac7c5d1f1f1e6aa0662d4d6d66277ce
|
|
| MD5 |
63b2fb609f45bcef535c53277cf909d0
|
|
| BLAKE2b-256 |
5311d8ab5ea6776d01bff8d7fa22b20a84e005063adcf324b3486643e19abf19
|
File details
Details for the file llming_models-0.1.1-py3-none-any.whl.
File metadata
- Download URL: llming_models-0.1.1-py3-none-any.whl
- Upload date:
- Size: 134.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.1 CPython/3.13.11 Darwin/25.2.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9f530489acb11ddf4165d958681765435dd92d8afd33913572ab99b47f6efa34
|
|
| MD5 |
2eebae1aab3fec2e25d57a72207b4950
|
|
| BLAKE2b-256 |
f0d585f2084c66cb2c18e15028b6e41c785642135066c1d9ae80bb0d64b2d568
|