A Python package containing tools for working with various language models and AI services.

These details have not been verified by PyPI

Project description

GitHub License Python Version from PEP 621 TOML

AIMU - AI Model Utilities

A Python package containing easy to use tools for working with various language models and AI services. AIMU is specifically designed for running models locally, using Ollama, Hugging Face Transformers, or any OpenAI-compatible local serving framework. It can also be used with cloud models (OpenAI, Anthropic, Google, etc.) via aisuite support.

Features

Model Clients: Support for multiple AI model providers including:
- Ollama (local models, native API)
- Hugging Face Transformers (local models)
- llama-cpp-python (local GGUF models, in-process, no external service required)
- aisuite supported models (cloud and local models), including OpenAI (others coming)
- OpenAI-compatible local serving frameworks via the openai SDK:
  - LM Studio (LMStudioOpenAIClient)
  - Ollama OpenAI-compat endpoint (OllamaOpenAIClient)
  - HuggingFace Transformers Serve (HFOpenAIClient)
  - vLLM (VLLMOpenAIClient)
  - Any OpenAI-compatible server (OpenAICompatClient)
Thinking Models: First-class support for extended reasoning models (e.g. DeepSeek-R1, Qwen3, GPT-OSS). Thinking is enabled automatically for supported models, with access to the reasoning traces.
Agentic Workflows: Agent and Workflow classes for autonomous, tool-driven task execution. Agents loop over tool calls until the task is complete; workflows chain agents sequentially. Both are configurable from plain dicts with minimal code.
MCP Tools: Model Context Protocol (MCP) client for enhancing AI capabilities. Provides a simple(r) interface for FastMCP 2.0.
Chat Conversation Storage/Management: Chat conversation history management using TinyDB.
Semantic Memory Storage: Persistent fact memory using ChromaDB. Facts are stored as natural-language subject-predicate-object strings (e.g. "Paul works at Google") and retrieved by semantic topic (e.g. "employment", "family life").
Prompt Storage/Management: Prompt catalog for storing and versioning prompts using SQLAlchemy.

Components

In addition to the AIMU package in the 'aimu' directory, the AIMU code repository includes:

Jupyter notebooks demonstrating key AIMU features.
Example chat clients in the web/ directory, built with Streamlit and Gradio, using AIMU Model Client, MCP tools support, and chat conversation management.
A full suite of Pytest tests.

Installation

AIMU can be installed with Ollama support, Hugging Face Transformers support, and/or aisuite (cloud models) support.

For all features, run:

pip install aimu[all]

Alternatively, for Ollama-only support:

pip install aimu[ollama]

For Hugging Face Tranformers model support:

pip install aimu[hf]

For aisuite models (e.g. OpenAI):

pip install aimu[aisuite]

For OpenAI-compatible local servers (LM Studio, Ollama, HuggingFace Transformers Serve, vLLM, etc.):

pip install aimu[openai_compat]

For local GGUF models via llama-cpp-python (no external service required):

pip install aimu[llamacpp]

For accessing potentially gated models via Hugging Face, you'll need to get and store (locally) a Hugging Face Hub access token. Once you have a token, you can install it locally with:

hf auth login

Development

Once you've cloned the repository, run the following command to install all model dependencies:

pip install -e '.[all]'

Additionally, run the following command to install development (testing, linting) and notebook dependencies:

pip install -e '.[dev,notebooks]'

Alternatively, if you have uv installed, you can get all model and development dependencies with:

uv sync --all-extras

Using Pytest, tests can be run for a specific model client and/or model, using optional arguments:

pytest tests\test_models.py --client=ollama --model=GPT_OSS_20B

Usage

Text Generation

from aimu.models import OllamaClient as ModelClient ## or HuggingFaceClient, or OpenAiCompatClient

model_client = ModelClient(ModelClient.MODELS.QWEN_3_5_9B)
response = model_client.generate("What is the capital of France?", {"temperature": 0.7})

Chat

from aimu.models import OllamaClient as ModelClient

model_client = ModelClient(ModelClient.MODELS.QWEN_3_5_9B)
response = model_client.chat("What is the capital of France?")

print(model_client.messages)

Thinking Models

Models with extended reasoning capabilities (e.g. DeepSeek-R1, Qwen3, GPT-OSS) are identified by the THINKING_MODELS list on each client. Thinking is enabled automatically when one of these models is selected.

After generation, the model's reasoning trace is available in last_thinking:

from aimu.models import OllamaClient as ModelClient

model_client = ModelClient(ModelClient.MODELS.DEEPSEEK_R1_8B)
response = model_client.generate("What is the capital of France?")

print(model_client.last_thinking)  # reasoning trace
print(response)                    # final answer

During streamed generation via generate_streamed(), thinking tokens are yielded first followed by the response tokens as a single flat stream. For phase-separated streaming (thinking, tool calls, response), use chat_streamed() instead.

Streamed Chat

chat_streamed() yields StreamChunk objects. Each chunk carries its own type:

`chunk.phase`	`chunk.content` type	Description
`StreamPhase.THINKING`	`str`	Reasoning token (thinking models only)
`StreamPhase.TOOL_CALLING`	`dict` `{"name": str, "response": str}`	Tool call and its result
`StreamPhase.GENERATING`	`str`	Final response token

from aimu.models import OllamaClient as ModelClient, StreamPhase

model_client = ModelClient(ModelClient.MODELS.QWEN_3_5_9B)
last_phase = None

for chunk in model_client.chat_streamed("What is the capital of France?"):
    if last_phase != chunk.phase:
        print(f"--- {chunk.phase} ---")
        last_phase = chunk.phase

    print(chunk.content, end="", flush=True)

OpenAI-Compatible Local Servers

Use LMStudioOpenAIClient, OllamaOpenAIClient, HFOpenAIClient, or VLLMOpenAIClient to connect to any local server that speaks the OpenAI REST API. Each client uses service-appropriate default URLs and model IDs:

from aimu.models import LMStudioOpenAIClient, LMStudioOpenAIModel

# Connects to http://localhost:1234/v1 by default
client = LMStudioOpenAIClient(LMStudioOpenAIModel.QWEN_3_8B)
response = client.chat("What is the capital of France?")

from aimu.models import OllamaOpenAIClient, OllamaOpenAIModel

# Connects to Ollama's OpenAI-compat endpoint at http://localhost:11434/v1
client = OllamaOpenAIClient(OllamaOpenAIModel.QWEN_3_8B)
response = client.chat("What is the capital of France?")

For a custom server or model not in the enum, use OpenAICompatClient directly:

from aimu.models import OpenAICompatClient
from aimu.models.openai_compat import OllamaOpenAIModel

client = OpenAICompatClient(OllamaOpenAIModel.QWEN_3_8B, base_url="http://myserver:8080/v1")

All OpenAI-compatible clients support the full ModelClient API. Streaming, tool calling, thinking models, and MCP tools work identically to the other clients.

Local GGUF Models (llama-cpp-python)

LlamaCppClient runs GGUF models directly in-process. Ollama, LM Studio, or another service are not required. Pass the path to any GGUF file and a LlamaCppModel enum value that describes the model's capabilities:

from aimu.models.llamacpp import LlamaCppClient, LlamaCppModel

client = LlamaCppClient(LlamaCppModel.QWEN_3_4B, model_path="/path/to/qwen3-4b.Q4_K_M.gguf")
response = client.chat("What is the capital of France?")

GPU offloading is enabled by default (n_gpu_layers=-1). To run on CPU only, pass n_gpu_layers=0. The context window defaults to 4096 tokens; increase with n_ctx:

client = LlamaCppClient(
    LlamaCppModel.QWEN_3_4B,
    model_path="/path/to/model.gguf",
    n_ctx=8192,
    n_gpu_layers=-1,  # offload all layers to GPU
)

All standard ModelClient features work: Streaming, tool calling, thinking models, and MCP tools.

Chat UI (Streamlit)

A full-featured chat UI with model/client selection, streaming, thinking model support, MCP tool calls, and conversation persistence.

streamlit run web/streamlit_chatbot.py

Chat UI (Gradio)

A full-featured chat UI equivalent to the Streamlit example above.

python web/gradio_chatbot.py

Agentic Workflows

An Agent wraps a ModelClient and runs a tool-calling loop until the model produces a response without invoking any tools.

from aimu.models.ollama import OllamaClient, OllamaModel
from aimu.tools import MCPClient
from aimu.agents import Agent

client = OllamaClient(OllamaModel.QWEN_3_8B)
client.mcp_client = MCPClient({"mcpServers": {"mytools": {"command": "python", "args": ["tools.py"]}}})

agent = Agent(client, name="assistant", max_iterations=10)
result = agent.run("Find all log files modified today and summarise the errors.")

Agents are configurable from a plain dict, making them easy to embed in larger systems:

agent = Agent.from_config(
    {"name": "researcher", "system_message": "Use tools to answer.", "max_iterations": 8},
    client,
)

A Workflow chains agents sequentially. The output of each step becomes the input to the next:

from aimu.agents import Workflow

wf = Workflow.from_config(
    [
        {"name": "planner",   "system_message": "Break the task into steps.", "max_iterations": 3},
        {"name": "executor",  "system_message": "Execute each step using tools.", "max_iterations": 10},
        {"name": "formatter", "system_message": "Format the results clearly.", "max_iterations": 1},
    ],
    lambda cfg: OllamaClient(OllamaModel.QWEN_3_8B),
)
result = wf.run("Research the top Python web frameworks.")

Both Agent and Workflow support streaming via run_streamed(), which yields AgentChunk / WorkflowChunk objects tagged with agent name, iteration, and StreamPhase.

MCP Tool Usage

from aimu.tools import MCPClient

mcp_client = MCPClient({
    "mcpServers": {
        "mytools": {"command": "python", "args": ["tools.py"]},
    }
})

mcp_client.call_tool("mytool", {"input": "hello world!"})

MCP Tool Usage with ModelClient

from aimu.models import OllamaClient as ModelClient
from aimu.tools import MCPClient

mcp_client = MCPClient({
    "mcpServers": {
        "mytools": {"command": "python", "args": ["tools.py"]},
    }
})

model_client = ModelClient(ModelClient.MODELS.QWEN_3_5_9B)
model_client.mcp_client = mcp_client

model_client.chat("use my tool please")

Chat Conversation Storage/Management

from aimu.models import OllamaClient as ModelClient
from aimu.memory import ConversationManager

chat_manager = ConversationManager("conversations.json", use_last_conversation=True) # loads the last saved convesation

model_client = new ModelClient(ModelClient.MODELS.QWEN_3_5_9B)
model_client.messages = chat_manager.messages

model_client.chat("What is the capital of France?")

chat_manager.update_conversation(model_client.messages) # store the updated conversation

Semantic Memory Storage

from aimu.memory import MemoryStore

store = MemoryStore(persist_path="./memory_store")

store.store_fact("Paul works at Google")
store.store_fact("Paul is married to Sarah")
store.store_fact("Sarah is the sister of Emma")

store.retrieve_facts("work and employment")   # ["Paul works at Google", ...]
store.retrieve_facts("family relationships")  # ["Paul is married to Sarah", ...]

Prompt Storage/Management

from aimu.prompts import PromptCatalog, Prompt

prompt_catalog = PromptCatalog("prompts.db")

prompt = Prompt("You are a helpful assistant", model_id="llama3.1:8b", version=1)
prompt_catalog.store_prompt(prompt)

License

This project is licensed under the Apache 2.0 license.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.4.0

May 26, 2026

0.3.2

May 5, 2026

0.3.1

May 4, 2026

0.3.0

Apr 27, 2026

This version

0.2.0

Apr 10, 2026

0.1.6

Sep 25, 2025

0.1.5

Aug 8, 2025

0.1.4

Jul 21, 2025

0.1.3

Jun 21, 2025

0.1.2

Jun 20, 2025

0.1.1

Jun 20, 2025

0.1.0

Jun 20, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aimu-0.2.0.tar.gz (53.7 kB view details)

Uploaded Apr 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

aimu-0.2.0-py3-none-any.whl (48.6 kB view details)

Uploaded Apr 10, 2026 Python 3

File details

Details for the file aimu-0.2.0.tar.gz.

File metadata

Download URL: aimu-0.2.0.tar.gz
Upload date: Apr 10, 2026
Size: 53.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for aimu-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`3e8bf9610c3a66af6e1910a1676ec8a8e5570c3683590d0dbcaf51616d6edfc8`
MD5	`d3324e8ea1450c0c4522dfb5c07eb4e2`
BLAKE2b-256	`c786d0c136afd591d3896f79625ac955bce88546fdd495db14634aced2e464e9`

See more details on using hashes here.

File details

Details for the file aimu-0.2.0-py3-none-any.whl.

File metadata

Download URL: aimu-0.2.0-py3-none-any.whl
Upload date: Apr 10, 2026
Size: 48.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for aimu-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2c1fea6322000b8857ad39f0656b112770914728cb868fc6f91fb0e1f1ed74fc`
MD5	`e1dd5c2a0053e0a42ef95fa444099420`
BLAKE2b-256	`c9f3ab11fe625c88c64212761f377c023ad14db3cb20645233787207ac3a6f65`

See more details on using hashes here.

aimu 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

AIMU - AI Model Utilities

Features

Components

Installation

Development

Usage

Text Generation

Chat

Thinking Models

Streamed Chat

OpenAI-Compatible Local Servers

Local GGUF Models (llama-cpp-python)

Chat UI (Streamlit)

Chat UI (Gradio)

Agentic Workflows

MCP Tool Usage

MCP Tool Usage with ModelClient

Chat Conversation Storage/Management

Semantic Memory Storage

Prompt Storage/Management

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes