A Python package containing tools for working with various language models and AI services.
Project description
AIMU - AI Model Utilities
A Python package containing easy to use tools for working with various language models and AI services. AIMU is specifically designed for running models locally, using Ollama, Hugging Face Transformers, or any OpenAI-compatible local serving framework. It can also be used with cloud models (OpenAI, Anthropic, Google, etc.) via aisuite support.
Features
-
Model Clients: Support for multiple AI model providers including:
- Ollama (local models, native API)
- Hugging Face Transformers (local models)
- llama-cpp-python (local GGUF models, in-process, no external service required)
- aisuite supported models (cloud and local models), including OpenAI (others coming)
- OpenAI-compatible local serving frameworks via the
openaiSDK:- LM Studio (
LMStudioOpenAIClient) - Ollama OpenAI-compat endpoint (
OllamaOpenAIClient) - HuggingFace Transformers Serve (
HFOpenAIClient) - vLLM (
VLLMOpenAIClient) - Any OpenAI-compatible server (
OpenAICompatClient)
- LM Studio (
-
Thinking Models: First-class support for extended reasoning models (e.g. DeepSeek-R1, Qwen3, GPT-OSS). Thinking is enabled automatically for supported models, with access to the reasoning traces.
-
Agentic Workflows:
AgentandWorkflowclasses for autonomous, tool-driven task execution. Agents loop over tool calls until the task is complete; workflows chain agents sequentially. Both are configurable from plain dicts with minimal code. -
MCP Tools: Model Context Protocol (MCP) client for enhancing AI capabilities. Provides a simple(r) interface for FastMCP 2.0.
-
Chat Conversation Storage/Management: Chat conversation history management using TinyDB.
-
Semantic Memory Storage: Persistent fact memory using ChromaDB. Facts are stored as natural-language subject-predicate-object strings (e.g.
"Paul works at Google") and retrieved by semantic topic (e.g."employment","family life"). -
Prompt Storage/Management: Prompt catalog for storing and versioning prompts using SQLAlchemy.
Components
In addition to the AIMU package in the 'aimu' directory, the AIMU code repository includes:
-
Jupyter notebooks demonstrating key AIMU features.
-
Example chat clients in the
web/directory, built with Streamlit and Gradio, using AIMU Model Client, MCP tools support, and chat conversation management. -
A full suite of Pytest tests.
Installation
AIMU can be installed with Ollama support, Hugging Face Transformers support, and/or aisuite (cloud models) support.
For all features, run:
pip install aimu[all]
Alternatively, for Ollama-only support:
pip install aimu[ollama]
For Hugging Face Tranformers model support:
pip install aimu[hf]
For aisuite models (e.g. OpenAI):
pip install aimu[aisuite]
For OpenAI-compatible local servers (LM Studio, Ollama, HuggingFace Transformers Serve, vLLM, etc.):
pip install aimu[openai_compat]
For local GGUF models via llama-cpp-python (no external service required):
pip install aimu[llamacpp]
For accessing potentially gated models via Hugging Face, you'll need to get and store (locally) a Hugging Face Hub access token. Once you have a token, you can install it locally with:
hf auth login
Development
Once you've cloned the repository, run the following command to install all model dependencies:
pip install -e '.[all]'
Additionally, run the following command to install development (testing, linting) and notebook dependencies:
pip install -e '.[dev,notebooks]'
Alternatively, if you have uv installed, you can get all model and development dependencies with:
uv sync --all-extras
Using Pytest, tests can be run for a specific model client and/or model, using optional arguments:
pytest tests\test_models.py --client=ollama --model=GPT_OSS_20B
Usage
Text Generation
from aimu.models import OllamaClient as ModelClient ## or HuggingFaceClient, or OpenAiCompatClient
model_client = ModelClient(ModelClient.MODELS.QWEN_3_5_9B)
response = model_client.generate("What is the capital of France?", {"temperature": 0.7})
Chat
from aimu.models import OllamaClient as ModelClient
model_client = ModelClient(ModelClient.MODELS.QWEN_3_5_9B)
response = model_client.chat("What is the capital of France?")
print(model_client.messages)
Thinking Models
Models with extended reasoning capabilities (e.g. DeepSeek-R1, Qwen3, GPT-OSS) are identified by the THINKING_MODELS list on each client. Thinking is enabled automatically when one of these models is selected.
After generation, the model's reasoning trace is available in last_thinking:
from aimu.models import OllamaClient as ModelClient
model_client = ModelClient(ModelClient.MODELS.DEEPSEEK_R1_8B)
response = model_client.generate("What is the capital of France?")
print(model_client.last_thinking) # reasoning trace
print(response) # final answer
During streamed generation via generate_streamed(), thinking tokens are yielded first followed by the response tokens as a single flat stream. For phase-separated streaming (thinking, tool calls, response), use chat_streamed() instead.
Streamed Chat
chat_streamed() yields StreamChunk objects. Each chunk carries its own type:
chunk.phase |
chunk.content type |
Description |
|---|---|---|
StreamPhase.THINKING |
str |
Reasoning token (thinking models only) |
StreamPhase.TOOL_CALLING |
dict {"name": str, "response": str} |
Tool call and its result |
StreamPhase.GENERATING |
str |
Final response token |
from aimu.models import OllamaClient as ModelClient, StreamPhase
model_client = ModelClient(ModelClient.MODELS.QWEN_3_5_9B)
last_phase = None
for chunk in model_client.chat_streamed("What is the capital of France?"):
if last_phase != chunk.phase:
print(f"--- {chunk.phase} ---")
last_phase = chunk.phase
print(chunk.content, end="", flush=True)
OpenAI-Compatible Local Servers
Use LMStudioOpenAIClient, OllamaOpenAIClient, HFOpenAIClient, or VLLMOpenAIClient to connect to any local server that speaks the OpenAI REST API. Each client uses service-appropriate default URLs and model IDs:
from aimu.models import LMStudioOpenAIClient, LMStudioOpenAIModel
# Connects to http://localhost:1234/v1 by default
client = LMStudioOpenAIClient(LMStudioOpenAIModel.QWEN_3_8B)
response = client.chat("What is the capital of France?")
from aimu.models import OllamaOpenAIClient, OllamaOpenAIModel
# Connects to Ollama's OpenAI-compat endpoint at http://localhost:11434/v1
client = OllamaOpenAIClient(OllamaOpenAIModel.QWEN_3_8B)
response = client.chat("What is the capital of France?")
For a custom server or model not in the enum, use OpenAICompatClient directly:
from aimu.models import OpenAICompatClient
from aimu.models.openai_compat import OllamaOpenAIModel
client = OpenAICompatClient(OllamaOpenAIModel.QWEN_3_8B, base_url="http://myserver:8080/v1")
All OpenAI-compatible clients support the full ModelClient API. Streaming, tool calling, thinking models, and MCP tools work identically to the other clients.
Local GGUF Models (llama-cpp-python)
LlamaCppClient runs GGUF models directly in-process. Ollama, LM Studio, or another service are not required. Pass the path to any GGUF file and a LlamaCppModel enum value that describes the model's capabilities:
from aimu.models.llamacpp import LlamaCppClient, LlamaCppModel
client = LlamaCppClient(LlamaCppModel.QWEN_3_4B, model_path="/path/to/qwen3-4b.Q4_K_M.gguf")
response = client.chat("What is the capital of France?")
GPU offloading is enabled by default (n_gpu_layers=-1). To run on CPU only, pass n_gpu_layers=0. The context window defaults to 4096 tokens; increase with n_ctx:
client = LlamaCppClient(
LlamaCppModel.QWEN_3_4B,
model_path="/path/to/model.gguf",
n_ctx=8192,
n_gpu_layers=-1, # offload all layers to GPU
)
All standard ModelClient features work: Streaming, tool calling, thinking models, and MCP tools.
Chat UI (Streamlit)
A full-featured chat UI with model/client selection, streaming, thinking model support, MCP tool calls, and conversation persistence.
streamlit run web/streamlit_chatbot.py
Chat UI (Gradio)
A full-featured chat UI equivalent to the Streamlit example above.
python web/gradio_chatbot.py
Agentic Workflows
An Agent wraps a ModelClient and runs a tool-calling loop until the model produces a response without invoking any tools.
from aimu.models.ollama import OllamaClient, OllamaModel
from aimu.tools import MCPClient
from aimu.agents import Agent
client = OllamaClient(OllamaModel.QWEN_3_8B)
client.mcp_client = MCPClient({"mcpServers": {"mytools": {"command": "python", "args": ["tools.py"]}}})
agent = Agent(client, name="assistant", max_iterations=10)
result = agent.run("Find all log files modified today and summarise the errors.")
Agents are configurable from a plain dict, making them easy to embed in larger systems:
agent = Agent.from_config(
{"name": "researcher", "system_message": "Use tools to answer.", "max_iterations": 8},
client,
)
A Workflow chains agents sequentially. The output of each step becomes the input to the next:
from aimu.agents import Workflow
wf = Workflow.from_config(
[
{"name": "planner", "system_message": "Break the task into steps.", "max_iterations": 3},
{"name": "executor", "system_message": "Execute each step using tools.", "max_iterations": 10},
{"name": "formatter", "system_message": "Format the results clearly.", "max_iterations": 1},
],
lambda cfg: OllamaClient(OllamaModel.QWEN_3_8B),
)
result = wf.run("Research the top Python web frameworks.")
Both Agent and Workflow support streaming via run_streamed(), which yields AgentChunk / WorkflowChunk objects tagged with agent name, iteration, and StreamPhase.
MCP Tool Usage
from aimu.tools import MCPClient
mcp_client = MCPClient({
"mcpServers": {
"mytools": {"command": "python", "args": ["tools.py"]},
}
})
mcp_client.call_tool("mytool", {"input": "hello world!"})
MCP Tool Usage with ModelClient
from aimu.models import OllamaClient as ModelClient
from aimu.tools import MCPClient
mcp_client = MCPClient({
"mcpServers": {
"mytools": {"command": "python", "args": ["tools.py"]},
}
})
model_client = ModelClient(ModelClient.MODELS.QWEN_3_5_9B)
model_client.mcp_client = mcp_client
model_client.chat("use my tool please")
Chat Conversation Storage/Management
from aimu.models import OllamaClient as ModelClient
from aimu.memory import ConversationManager
chat_manager = ConversationManager("conversations.json", use_last_conversation=True) # loads the last saved convesation
model_client = new ModelClient(ModelClient.MODELS.QWEN_3_5_9B)
model_client.messages = chat_manager.messages
model_client.chat("What is the capital of France?")
chat_manager.update_conversation(model_client.messages) # store the updated conversation
Semantic Memory Storage
from aimu.memory import MemoryStore
store = MemoryStore(persist_path="./memory_store")
store.store_fact("Paul works at Google")
store.store_fact("Paul is married to Sarah")
store.store_fact("Sarah is the sister of Emma")
store.retrieve_facts("work and employment") # ["Paul works at Google", ...]
store.retrieve_facts("family relationships") # ["Paul is married to Sarah", ...]
Prompt Storage/Management
from aimu.prompts import PromptCatalog, Prompt
prompt_catalog = PromptCatalog("prompts.db")
prompt = Prompt("You are a helpful assistant", model_id="llama3.1:8b", version=1)
prompt_catalog.store_prompt(prompt)
License
This project is licensed under the Apache 2.0 license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aimu-0.2.0.tar.gz.
File metadata
- Download URL: aimu-0.2.0.tar.gz
- Upload date:
- Size: 53.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3e8bf9610c3a66af6e1910a1676ec8a8e5570c3683590d0dbcaf51616d6edfc8
|
|
| MD5 |
d3324e8ea1450c0c4522dfb5c07eb4e2
|
|
| BLAKE2b-256 |
c786d0c136afd591d3896f79625ac955bce88546fdd495db14634aced2e464e9
|
File details
Details for the file aimu-0.2.0-py3-none-any.whl.
File metadata
- Download URL: aimu-0.2.0-py3-none-any.whl
- Upload date:
- Size: 48.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2c1fea6322000b8857ad39f0656b112770914728cb868fc6f91fb0e1f1ed74fc
|
|
| MD5 |
e1dd5c2a0053e0a42ef95fa444099420
|
|
| BLAKE2b-256 |
c9f3ab11fe625c88c64212761f377c023ad14db3cb20645233787207ac3a6f65
|