Skip to main content

A Python package containing tools for working with various language models and AI services.

Project description

PyPI GitHub License Python Version from PEP 621 TOML uv Ruff

AIMU - AI Model Utilities

A Python package containing easy to use tools for working with various language models and AI services. AIMU is specifically designed for running models locally, using Ollama, Hugging Face Transformers, or any OpenAI-compatible local serving framework. It can also be used with cloud models (OpenAI, Anthropic, Google, etc.) via aisuite support.

Features

  • Model Clients: Support for multiple AI model providers including:

  • Thinking Models: First-class support for extended reasoning models (e.g. DeepSeek-R1, Qwen3, GPT-OSS). Thinking is enabled automatically for supported models, with access to the reasoning traces.

  • Agentic Workflows: Agent and Workflow classes for autonomous, tool-driven task execution. Agents loop over tool calls until the task is complete; workflows chain agents sequentially. Both are configurable from plain dicts with minimal code.

  • MCP Tools: Model Context Protocol (MCP) client for enhancing AI capabilities. Provides a simple(r) interface for FastMCP 2.0.

  • Chat Conversation Storage/Management: Chat conversation history management using TinyDB.

  • Semantic Memory Storage: Persistent fact memory using ChromaDB. Facts are stored as natural-language subject-predicate-object strings (e.g. "Paul works at Google") and retrieved by semantic topic (e.g. "employment", "family life").

  • Prompt Storage/Management: Prompt catalog for storing and versioning prompts using SQLAlchemy.

Components

In addition to the AIMU package in the 'aimu' directory, the AIMU code repository includes:

  • Jupyter notebooks demonstrating key AIMU features.

  • Example chat clients in the web/ directory, built with Streamlit and Gradio, using AIMU Model Client, MCP tools support, and chat conversation management.

  • A full suite of Pytest tests.

Installation

AIMU can be installed with Ollama support, Hugging Face Transformers support, and/or aisuite (cloud models) support.

For all features, run:

pip install aimu[all]

Alternatively, for Ollama-only support:

pip install aimu[ollama]

For Hugging Face Tranformers model support:

pip install aimu[hf]

For aisuite models (e.g. OpenAI):

pip install aimu[aisuite]

For OpenAI-compatible local servers (LM Studio, Ollama, HuggingFace Transformers Serve, vLLM, etc.):

pip install aimu[openai_compat]

For local GGUF models via llama-cpp-python (no external service required):

pip install aimu[llamacpp]

For accessing potentially gated models via Hugging Face, you'll need to get and store (locally) a Hugging Face Hub access token. Once you have a token, you can install it locally with:

hf auth login

Development

Once you've cloned the repository, run the following command to install all model dependencies:

pip install -e '.[all]'

Additionally, run the following command to install development (testing, linting) and notebook dependencies:

pip install -e '.[dev,notebooks]'

Alternatively, if you have uv installed, you can get all model and development dependencies with:

uv sync --all-extras

Using Pytest, tests can be run for a specific model client and/or model, using optional arguments:

pytest tests\test_models.py --client=ollama --model=GPT_OSS_20B

Usage

Text Generation

from aimu.models import OllamaClient as ModelClient ## or HuggingFaceClient, or OpenAiCompatClient

model_client = ModelClient(ModelClient.MODELS.QWEN_3_5_9B)
response = model_client.generate("What is the capital of France?", {"temperature": 0.7})

Chat

from aimu.models import OllamaClient as ModelClient

model_client = ModelClient(ModelClient.MODELS.QWEN_3_5_9B)
response = model_client.chat("What is the capital of France?")

print(model_client.messages)

Thinking Models

Models with extended reasoning capabilities (e.g. DeepSeek-R1, Qwen3, GPT-OSS) are identified by the THINKING_MODELS list on each client. Thinking is enabled automatically when one of these models is selected.

After generation, the model's reasoning trace is available in last_thinking:

from aimu.models import OllamaClient as ModelClient

model_client = ModelClient(ModelClient.MODELS.DEEPSEEK_R1_8B)
response = model_client.generate("What is the capital of France?")

print(model_client.last_thinking)  # reasoning trace
print(response)                    # final answer

During streamed generation via generate_streamed(), thinking tokens are yielded first followed by the response tokens as a single flat stream. For phase-separated streaming (thinking, tool calls, response), use chat_streamed() instead.

Streamed Chat

chat_streamed() yields StreamChunk objects. Each chunk carries its own type:

chunk.phase chunk.content type Description
StreamPhase.THINKING str Reasoning token (thinking models only)
StreamPhase.TOOL_CALLING dict {"name": str, "response": str} Tool call and its result
StreamPhase.GENERATING str Final response token
from aimu.models import OllamaClient as ModelClient, StreamPhase

model_client = ModelClient(ModelClient.MODELS.QWEN_3_5_9B)
last_phase = None

for chunk in model_client.chat_streamed("What is the capital of France?"):
    if last_phase != chunk.phase:
        print(f"--- {chunk.phase} ---")
        last_phase = chunk.phase

    print(chunk.content, end="", flush=True)

OpenAI-Compatible Local Servers

Use LMStudioOpenAIClient, OllamaOpenAIClient, HFOpenAIClient, or VLLMOpenAIClient to connect to any local server that speaks the OpenAI REST API. Each client uses service-appropriate default URLs and model IDs:

from aimu.models import LMStudioOpenAIClient, LMStudioOpenAIModel

# Connects to http://localhost:1234/v1 by default
client = LMStudioOpenAIClient(LMStudioOpenAIModel.QWEN_3_8B)
response = client.chat("What is the capital of France?")
from aimu.models import OllamaOpenAIClient, OllamaOpenAIModel

# Connects to Ollama's OpenAI-compat endpoint at http://localhost:11434/v1
client = OllamaOpenAIClient(OllamaOpenAIModel.QWEN_3_8B)
response = client.chat("What is the capital of France?")

For a custom server or model not in the enum, use OpenAICompatClient directly:

from aimu.models import OpenAICompatClient
from aimu.models.openai_compat import OllamaOpenAIModel

client = OpenAICompatClient(OllamaOpenAIModel.QWEN_3_8B, base_url="http://myserver:8080/v1")

All OpenAI-compatible clients support the full ModelClient API. Streaming, tool calling, thinking models, and MCP tools work identically to the other clients.

Local GGUF Models (llama-cpp-python)

LlamaCppClient runs GGUF models directly in-process. Ollama, LM Studio, or another service are not required. Pass the path to any GGUF file and a LlamaCppModel enum value that describes the model's capabilities:

from aimu.models.llamacpp import LlamaCppClient, LlamaCppModel

client = LlamaCppClient(LlamaCppModel.QWEN_3_4B, model_path="/path/to/qwen3-4b.Q4_K_M.gguf")
response = client.chat("What is the capital of France?")

GPU offloading is enabled by default (n_gpu_layers=-1). To run on CPU only, pass n_gpu_layers=0. The context window defaults to 4096 tokens; increase with n_ctx:

client = LlamaCppClient(
    LlamaCppModel.QWEN_3_4B,
    model_path="/path/to/model.gguf",
    n_ctx=8192,
    n_gpu_layers=-1,  # offload all layers to GPU
)

All standard ModelClient features work: Streaming, tool calling, thinking models, and MCP tools.

Chat UI (Streamlit)

A full-featured chat UI with model/client selection, streaming, thinking model support, MCP tool calls, and conversation persistence.

streamlit run web/streamlit_chatbot.py

Chat UI (Gradio)

A full-featured chat UI equivalent to the Streamlit example above.

python web/gradio_chatbot.py

Agentic Workflows

An Agent wraps a ModelClient and runs a tool-calling loop until the model produces a response without invoking any tools.

from aimu.models.ollama import OllamaClient, OllamaModel
from aimu.tools import MCPClient
from aimu.agents import Agent

client = OllamaClient(OllamaModel.QWEN_3_8B)
client.mcp_client = MCPClient({"mcpServers": {"mytools": {"command": "python", "args": ["tools.py"]}}})

agent = Agent(client, name="assistant", max_iterations=10)
result = agent.run("Find all log files modified today and summarise the errors.")

Agents are configurable from a plain dict, making them easy to embed in larger systems:

agent = Agent.from_config(
    {"name": "researcher", "system_message": "Use tools to answer.", "max_iterations": 8},
    client,
)

A Workflow chains agents sequentially. The output of each step becomes the input to the next:

from aimu.agents import Workflow

wf = Workflow.from_config(
    [
        {"name": "planner",   "system_message": "Break the task into steps.", "max_iterations": 3},
        {"name": "executor",  "system_message": "Execute each step using tools.", "max_iterations": 10},
        {"name": "formatter", "system_message": "Format the results clearly.", "max_iterations": 1},
    ],
    lambda cfg: OllamaClient(OllamaModel.QWEN_3_8B),
)
result = wf.run("Research the top Python web frameworks.")

Both Agent and Workflow support streaming via run_streamed(), which yields AgentChunk / WorkflowChunk objects tagged with agent name, iteration, and StreamPhase.

MCP Tool Usage

from aimu.tools import MCPClient

mcp_client = MCPClient({
    "mcpServers": {
        "mytools": {"command": "python", "args": ["tools.py"]},
    }
})

mcp_client.call_tool("mytool", {"input": "hello world!"})

MCP Tool Usage with ModelClient

from aimu.models import OllamaClient as ModelClient
from aimu.tools import MCPClient

mcp_client = MCPClient({
    "mcpServers": {
        "mytools": {"command": "python", "args": ["tools.py"]},
    }
})

model_client = ModelClient(ModelClient.MODELS.QWEN_3_5_9B)
model_client.mcp_client = mcp_client

model_client.chat("use my tool please")

Chat Conversation Storage/Management

from aimu.models import OllamaClient as ModelClient
from aimu.memory import ConversationManager

chat_manager = ConversationManager("conversations.json", use_last_conversation=True) # loads the last saved convesation

model_client = new ModelClient(ModelClient.MODELS.QWEN_3_5_9B)
model_client.messages = chat_manager.messages

model_client.chat("What is the capital of France?")

chat_manager.update_conversation(model_client.messages) # store the updated conversation

Semantic Memory Storage

from aimu.memory import MemoryStore

store = MemoryStore(persist_path="./memory_store")

store.store_fact("Paul works at Google")
store.store_fact("Paul is married to Sarah")
store.store_fact("Sarah is the sister of Emma")

store.retrieve_facts("work and employment")   # ["Paul works at Google", ...]
store.retrieve_facts("family relationships")  # ["Paul is married to Sarah", ...]

Prompt Storage/Management

from aimu.prompts import PromptCatalog, Prompt

prompt_catalog = PromptCatalog("prompts.db")

prompt = Prompt("You are a helpful assistant", model_id="llama3.1:8b", version=1)
prompt_catalog.store_prompt(prompt)

License

This project is licensed under the Apache 2.0 license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aimu-0.2.0.tar.gz (53.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aimu-0.2.0-py3-none-any.whl (48.6 kB view details)

Uploaded Python 3

File details

Details for the file aimu-0.2.0.tar.gz.

File metadata

  • Download URL: aimu-0.2.0.tar.gz
  • Upload date:
  • Size: 53.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for aimu-0.2.0.tar.gz
Algorithm Hash digest
SHA256 3e8bf9610c3a66af6e1910a1676ec8a8e5570c3683590d0dbcaf51616d6edfc8
MD5 d3324e8ea1450c0c4522dfb5c07eb4e2
BLAKE2b-256 c786d0c136afd591d3896f79625ac955bce88546fdd495db14634aced2e464e9

See more details on using hashes here.

File details

Details for the file aimu-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: aimu-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 48.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for aimu-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2c1fea6322000b8857ad39f0656b112770914728cb868fc6f91fb0e1f1ed74fc
MD5 e1dd5c2a0053e0a42ef95fa444099420
BLAKE2b-256 c9f3ab11fe625c88c64212761f377c023ad14db3cb20645233787207ac3a6f65

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page