A Python package containing tools for working with various language models and AI services.
Project description
AIMU - AI Model Utilities
AIMU is a Python library for building LLM-powered applications with a consistent interface across local and cloud providers. It separates autonomous agents from code-controlled workflows, and treats agents as composable units that can be used anywhere a simple model is accepted. MCP tool integration is structural (not a plugin), semantic and/or document memory can be dropped in, and prompt management and tuning make it easy to optimize prompts for concrete use cases.
Table of Contents
Features
-
Model Clients: A factory selects the right provider client from the model enum, so you can swap providers without changing call sites. Streaming output is typed by phase (thinking, tool calling, response generation), making it straightforward to build UIs or observability on top. All clients support extended reasoning models (e.g. DeepSeek-R1, Qwen3, GPT-OSS) with reasoning traces available in both streaming and non-streaming modes.
- Ollama (local models, native API)
- Hugging Face Transformers (local models)
- llama-cpp-python (local GGUF models, in-process, no external service required)
- Anthropic Claude models via native
anthropicSDK (AnthropicClient) - native thinking support - Cloud and local servers via the
openaiSDK (aimu[openai_compat]):- OpenAI (
OpenAIClient) - GPT-4o, GPT-4.1, o3, o4-mini, and more - Google Gemini (
GeminiClient) - Gemini 2.0/2.5 via Google's OpenAI-compatible endpoint - LM Studio (
LMStudioOpenAIClient) - Ollama OpenAI-compat endpoint (
OllamaOpenAIClient) - HuggingFace Transformers Serve (
HFOpenAIClient) - vLLM (
VLLMOpenAIClient) - llama.cpp llama-server (
LlamaServerOpenAIClient) - SGLang (
SGLangOpenAIClient) - Any OpenAI-compatible server (
OpenAICompatClient)
- OpenAI (
-
Agents & Workflows: Per Anthropic's taxonomy, AIMU separates autonomous agents from code-controlled workflows. Agents expose the same interface as plain model clients, so they can be used as drop-in replacements anywhere a client is accepted, enabling recursive composition of agents within workflows and workflows within agents.
- Agents:
SimpleAgentruns an autonomous tool-calling loop until the model stops invoking tools.SkillAgentextends it with automatic skill injection.AgenticModelClientwraps aSimpleAgentbehind the standard model client interface, making agentic and single-turn clients interchangeable. - Workflows: Four code-controlled patterns:
Chain(prompt chaining),Router(classify and dispatch),Parallel(concurrent workers with optional aggregation), andEvaluatorOptimizer(generate → evaluate → revise loop). - Example Agents: Ready-to-use orchestrator agents in
aimu.agents.examplesthat demonstrate the orchestrator + worker tools pattern:ResearchReportAgent,CodeReviewAgent, andContentCreationAgent. Each coordinates worker sub-agents via MCP tools, letting the LLM decide how to use them. - Skills: Filesystem-discovered
SKILL.mdfiles that inject instructions and tools intoSkillAgentautomatically. Skills are discovered from project and user directories (.agents/skills/,.claude/skills/).
- Agents:
-
MCP Tools: MCP tool integration is built into the base model client as a first-class attribute, not a plugin. Attach an
MCPClientto any model client and tools are passed to the model automatically. Provides a simpler interface for FastMCP 2.0. -
Prompt Management: Versioned prompt storage and automatic prompt optimization:
- Prompt Storage: Versioned prompt catalog backed by SQLite (SQLAlchemy). Prompts are keyed by
(name, model_id)and auto-versioned on each store. - Prompt Tuning: Hill-climbing
PromptTunerfor automatic prompt optimization against labelled data, without ML machinery. Four concrete tuners:ClassificationPromptTuner(binary YES/NO),MultiClassPromptTuner(N-way),ExtractionPromptTuner(JSON field extraction), andJudgedPromptTuner(open-ended generation rated by a second LLM). SubclassPromptTunerto implement custom task types.
- Prompt Storage: Versioned prompt catalog backed by SQLite (SQLAlchemy). Prompts are keyed by
-
Persistence: Three complementary stores for persisting conversation and knowledge:
- Conversation History (
ConversationManager): Persistent chat message history backed by TinyDB. Load the last conversation on startup and save updates after each turn. - Semantic Memory (
SemanticMemoryStore): Fact storage using ChromaDB vector embeddings. Store natural-language subject-predicate-object strings (e.g."Paul works at Google") and retrieve by semantic topic (e.g."employment","family life"). - Document Memory (
DocumentStore): Path-based document store mirroring Anthropic's Managed Agents Memory API. Supportswrite,read,edit,delete, and full-textsearchon named paths (e.g./preferences.md).
- Conversation History (
Components
In addition to the AIMU package in the 'aimu' directory, the AIMU code repository includes:
-
Jupyter notebooks demonstrating key AIMU features.
-
Example chat clients in the
web/directory, built with Streamlit and Gradio, using AIMU Model Client, MCP tools support, and chat conversation management. -
A full suite of Pytest tests.
Examples
The following Jupyter notebooks demonstrate key AIMU features:
| Notebook | Description |
|---|---|
| 01 - Model Client | Text generation, chat, streaming, and thinking models |
| 02 - MCP Tools | MCP tool integration with model clients |
| 03 - Prompt Management | Versioned prompt storage |
| 04 - Prompt Tuning | ClassificationPromptTuner, MultiClassPromptTuner, ExtractionPromptTuner, JudgedPromptTuner |
| 05 - Conversations | Persistent chat conversation management |
| 06 - Memory | Semantic fact storage and retrieval |
| 07 - Agents | SimpleAgent and AgenticModelClient |
| 08 - Agent Skills | Filesystem-discovered skill injection with SkillAgent |
| 09 - Agent Workflows | Chain, Router, Parallel, and EvaluatorOptimizer patterns |
| 10 - Agent Examples | ResearchReportAgent, CodeReviewAgent, ContentCreationAgent — orchestrator + worker tools pattern |
Installation
For all features, run:
pip install aimu[all]
Or install only what you need:
pip install aimu[ollama] # Ollama (local models, native API)
pip install aimu[hf] # Hugging Face Transformers (local models)
pip install aimu[anthropic] # Anthropic Claude models
pip install aimu[openai_compat] # OpenAI, Google Gemini, and OpenAI-compatible local servers
pip install aimu[llamacpp] # Local GGUF models via llama-cpp-python (no external service)
For gated Hugging Face models, you'll need a Hugging Face Hub access token:
hf auth login
Development
Once you've cloned the repository, run the following command to install all model dependencies:
pip install -e '.[all]'
Additionally, run the following command to install development (testing, linting) and notebook dependencies:
pip install -e '.[dev,notebooks]'
Alternatively, if you have uv installed, you can get all model and development dependencies with:
uv sync --all-extras
Using Pytest, tests can be run for a specific model client and/or model, using optional arguments:
pytest tests\test_models.py --client=ollama --model=GPT_OSS_20B
Usage
Model Clients
All clients implement the BaseModelClient abstract interface. ModelClient is a factory that automatically selects the right client from a model enum, so you can swap providers without changing call sites:
from aimu.models import ModelClient
from aimu.models.ollama.ollama_client import OllamaModel
client = ModelClient(OllamaModel.QWEN_3_8B) # factory: picks OllamaClient automatically
response = client.generate("Summarise this text.", {"temperature": 0.7}) # stateless
response = client.chat("What is the capital of France?") # multi-turn
print(client.messages) # full message history
You can also import the concrete client directly when you prefer explicit control:
from aimu.models.ollama import OllamaClient, OllamaModel
client = OllamaClient(OllamaModel.QWEN_3_8B)
Cloud and local server clients follow the same pattern; only the model enum (and optional kwargs) differ:
| Client | Extra | API key / notes |
|---|---|---|
OllamaClient |
aimu[ollama] |
- |
HuggingFaceClient |
aimu[hf] |
- |
LlamaCppClient |
aimu[llamacpp] |
model_path= (GGUF file); no external service |
OpenAIClient |
aimu[openai_compat] |
OPENAI_API_KEY |
AnthropicClient |
aimu[anthropic] |
ANTHROPIC_API_KEY |
GeminiClient |
aimu[openai_compat] |
GOOGLE_API_KEY |
LMStudioOpenAIClient |
aimu[openai_compat] |
localhost:1234 |
OllamaOpenAIClient |
aimu[openai_compat] |
localhost:11434 |
VLLMOpenAIClient |
aimu[openai_compat] |
localhost:8000 |
LlamaServerOpenAIClient |
aimu[openai_compat] |
localhost:8080 |
SGLangOpenAIClient |
aimu[openai_compat] |
localhost:30000 |
Streaming: chat(..., stream=True) yields StreamChunk objects tagged by phase:
chunk.phase |
chunk.content type |
Description |
|---|---|---|
StreamPhase.THINKING |
str |
Reasoning token (thinking models only) |
StreamPhase.TOOL_CALLING |
dict {"name": str, "response": str} |
Tool call and result |
StreamPhase.GENERATING |
str |
Final response token |
Thinking models: extended reasoning (e.g. DeepSeek-R1, Qwen3, GPT-OSS) is enabled automatically for supported models. The reasoning trace is available in client.last_thinking after generation, or as StreamPhase.THINKING chunks during streaming.
Chat UIs: full-featured UIs with streaming, tool calls, and conversation persistence: streamlit run web/streamlit_chatbot.py (Streamlit) or python web/gradio_chatbot.py (Gradio).
See 01 - Model Client for detailed examples.
Agents & Workflows
SimpleAgent wraps a ModelClient and runs a tool-calling loop until the model stops invoking tools:
from aimu.models.ollama import OllamaClient, OllamaModel
from aimu.tools import MCPClient
from aimu.agents import SimpleAgent
client = OllamaClient(OllamaModel.QWEN_3_8B)
client.mcp_client = MCPClient({"mcpServers": {"mytools": {"command": "python", "args": ["tools.py"]}}})
agent = SimpleAgent.from_config(
{"name": "researcher", "system_message": "Use tools to answer.", "max_iterations": 8},
client,
)
result = agent.run("Find all log files modified today and summarise the errors.")
SkillAgent extends SimpleAgent with automatic discovery and injection of SKILL.md skill files:
from aimu.agents import SkillAgent
agent = SkillAgent(client, name="assistant") # discovers skills from .agents/skills/ and .claude/skills/
result = agent.run("Use the pdf-processing skill to extract pages from report.pdf")
Workflow patterns have code-controlled flow. Chain sequences agents so each step's output becomes the next step's input:
from aimu.agents import Chain
chain = Chain.from_config(
[
{"name": "planner", "system_message": "Break the task into steps.", "max_iterations": 3},
{"name": "executor", "system_message": "Execute each step using tools.", "max_iterations": 10},
{"name": "formatter", "system_message": "Format the results clearly.", "max_iterations": 1},
],
client,
)
result = chain.run("Research the top Python web frameworks.")
Every Runner exposes run(task, stream=False) and .messages. Pass stream=True to get an AgentChunk iterator instead of a string.
Example agents in aimu.agents.examples wire up an orchestrator with worker sub-agents as MCP tools — the LLM coordinates them autonomously:
from aimu.models.ollama import OllamaClient
from aimu.agents.examples import ResearchReportAgent
client = OllamaClient(OllamaClient.MODELS.QWEN_3_8B)
agent = ResearchReportAgent(client)
report = agent.run("What is retrieval-augmented generation?")
for chunk in agent.run("Explain transformer attention", stream=True):
if chunk.phase == StreamPhase.GENERATING:
print(chunk.content, end="", flush=True)
See 07 - Agents, 08 - Agent Skills, 09 - Agent Workflows, and 10 - Agent Examples for the example agents.
MCP Tools
MCPClient wraps a FastMCP 2.0 server and integrates with any ModelClient via model_client.mcp_client:
from aimu.models import ModelClient
from aimu.models.ollama.ollama_client import OllamaModel
from aimu.tools import MCPClient
mcp_client = MCPClient({
"mcpServers": {
"mytools": {"command": "python", "args": ["tools.py"]},
}
})
# Use standalone
mcp_client.call_tool("mytool", {"input": "hello world!"})
# Or attach to a model client; tools are passed to the model automatically
model_client = ModelClient(OllamaModel.QWEN_3_8B)
model_client.mcp_client = mcp_client
model_client.chat("use my tool please")
See 02 - MCP Tools.
Persistence
Conversation history: ConversationManager persists chat message sequences across sessions:
from aimu.models import ModelClient
from aimu.models.ollama.ollama_client import OllamaModel
from aimu.history import ConversationManager
manager = ConversationManager("conversations.json", use_last_conversation=True)
model_client = ModelClient(OllamaModel.QWEN_3_8B)
model_client.messages = manager.messages
model_client.chat("What is the capital of France?")
manager.update_conversation(model_client.messages)
Semantic memory: SemanticMemoryStore stores and retrieves facts by semantic similarity:
from aimu.memory import SemanticMemoryStore
store = SemanticMemoryStore(persist_path="./memory_store")
store.store("Paul works at Google")
store.search("employment") # ["Paul works at Google"]
store.search("employment", max_distance=0.4) # only close matches
Document memory: DocumentStore is a path-keyed document store mirroring Anthropic's Managed Agents Memory API:
from aimu.memory import DocumentStore
store = DocumentStore(persist_path="./doc_store")
store.write("/preferences.md", "Always use concise responses.")
store.edit("/preferences.md", "concise", "detailed")
store.search_full_text("detailed")
See 05 - Conversations and 06 - Memory.
Prompt Management
Prompt catalog: PromptCatalog stores versioned prompts keyed by (name, model_id):
from aimu.prompts import PromptCatalog, Prompt
with PromptCatalog("prompts.db") as catalog:
prompt = Prompt(name="summarizer", prompt="Summarize the following: {content}", model_id="llama3.1:8b")
catalog.store_prompt(prompt) # version and created_at assigned automatically
latest = catalog.retrieve_last("summarizer", "llama3.1:8b")
print(f"v{latest.version}: {latest.prompt}")
Prompt tuning: PromptTuner runs a hill-climbing loop to automatically improve a prompt against labelled data. Pass a DataFrame with content and actual_class columns:
import pandas as pd
from aimu.prompts import ClassificationPromptTuner
tuner = ClassificationPromptTuner(model_client=client)
df = pd.DataFrame({
"content": ["LLMs are transforming AI.", "The recipe calls for flour.", ...],
"actual_class": [True, False, ...],
})
best_prompt = tuner.tune(df, initial_prompt="Is this about AI? Reply [YES] or [NO]. Content: {content}")
MultiClassPromptTuner, ExtractionPromptTuner, and JudgedPromptTuner follow the same pattern. Subclass PromptTuner and implement apply_prompt, evaluate, and mutation_prompt for custom task types.
See 03 - Prompt Management and 04 - Prompt Tuning.
License
This project is licensed under the Apache 2.0 license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aimu-0.3.0.tar.gz.
File metadata
- Download URL: aimu-0.3.0.tar.gz
- Upload date:
- Size: 102.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
80fdec98b63e43d2c11ea78f2918c5e57a59f38c2624f78c52e2a8fd47389d56
|
|
| MD5 |
b0c8c872739969b064c5b9732723dba8
|
|
| BLAKE2b-256 |
09d7b26ec55d214d3c1f12c7259c5b074b6c71d19cd0a6b80c830cfef88dc91c
|
File details
Details for the file aimu-0.3.0-py3-none-any.whl.
File metadata
- Download URL: aimu-0.3.0-py3-none-any.whl
- Upload date:
- Size: 97.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e7e4f883434e58293dfef48879f32cfbba4d345994c02fd86397370f7e2cab55
|
|
| MD5 |
1380d2eeaf44261aa19b7260d328bedd
|
|
| BLAKE2b-256 |
6e77f9d311f6facc558c8c5f386bdf7066257289caf4323e94c2d322fe66a91e
|