aimu · PyPI

AI Modeling Utilities: A Python package containing support for working with numerous AI models and services.

These details have not been verified by PyPI

Project description

Simple, composable AI for Python, local or in the cloud.

GitHub License Python Version from PEP 621 TOML

Docs · Tutorials · How-to · Reference · Notebooks

AIMU is a Python library for AI-powered applications, with language models as the primary building block. It gives you a single provider-agnostic interface across text, images, audio, and speech; autonomous agents and code-controlled workflows; and small composable utilities for tools, memory, prompt tuning, evaluations, and benchmarking. All of these features in plain Python that is apparent and easy to use.

Whether you need vision input, autonomous tool use, image generation, audio generation, or text-to-speech, the call is one line:

aimu.chat("What's in this photo?", model="...", images=["photo.jpg"])

aimu.agent("...", tools=builtin.web).run("Search the web and summarize today's AI news")

aimu.generate_image("a watercolor fox in a snowy forest", model="...")
aimu.generate_audio("a lo-fi hip-hop beat with soft piano", model="...")
aimu.generate_speech("Hello, world!", model="...")

Composition happens by passing objects to constructors. Conversation state is a list[dict] you can print and edit. Provider-specific details adapt at request time and never leak into your code.

Key features

Language models

One client interface for Ollama, HuggingFace, llama-cpp, the Claude API, OpenAI, Gemini, and any OpenAI-compatible local server (LM Studio, vLLM, SGLang, llama-server, HF Transformers Serve). Swap with a string change: "provider:model_id".
Reasoning, tool calling, and vision input work identically across every provider. Reasoning models surface their tokens as StreamingContentType.THINKING chunks via the same API.
Typed streaming: StreamChunk(phase, content, agent, iteration) flows through client.chat(), Agent.run(), and every workflow. Filter with include=["generating"].

Image and audio generation

Consistent APIs for text-to-image (aimu.image_client() / aimu.generate_image()) and text-to-audio (aimu.audio_client() / aimu.generate_audio()), mirroring the text client interface.
For images: HuggingFace diffusers locally (SD 1.5 / SDXL / SD 3.5 / FLUX 1 dev & schnell / FLUX 2 Klein 4B & 9B) and Google Nano Banana via the cloud API. Pass reference_image= to any generate() call for image-to-image workflows.
For audio (music and sound): HuggingFace with MusicGen small/medium/large (32 kHz), AudioLDM2 (16 kHz), and Stable Audio Open (44.1 kHz stereo).
Drop image and audio generation into any chat agent via the built-in generate_image and generate_audio tools.

Speech

aimu.speech_client() / aimu.generate_speech() for text-to-speech. HuggingFace locally (SpeechT5, MMS-TTS, BARK); OpenAI (tts-1, tts-1-hd) in the cloud.
Drop TTS into any agent via the built-in generate_speech tool; bind a specific voice with make_speech_tool(client, voice=...).
Speech-to-text (transcription) is planned as a parallel aimu.transcription_client() surface.

Agents and workflows

Agent runs an autonomous tool-using loop until the model stops calling tools.
OrchestrationAgent interface/pattern for coordinating sub-agent work, and three pre-built agents (CodeReviewAgent, ContentCreationAgent, and ResearchReportAgent).
Four code-controlled workflow patterns: Chain.from_client(...), Router.from_client(...), Parallel.from_client(...), EvaluatorOptimizer(...). Compose freely. Workflows accept agents as steps; agents accept workflows as tools via as_model_client().
agent.as_model_client() makes any agent a drop-in BaseModelClient, so agentic and non-agentic clients are interchangeable.

Tools

@tool on any plain Python function. Type hints + docstring become the spec.
MCPClient for cross-process FastMCP tools. Combine with @tool on the same agent.
Built-in tool groups ready to pass to tools=: builtin.web, builtin.fs, builtin.compute, builtin.misc, builtin.image, builtin.audio, builtin.speech. builtin.make_tools(client, image_client=None, audio_client=None, speech_client=None) assembles the full tool list with auto image/vision/audio/speech wiring.
Filesystem-discovered SKILL.md files auto-inject into a SkillAgent (same format Claude Code uses).

Memory and persistence

SemanticMemoryStore (ChromaDB vector search), DocumentStore (path-keyed, drop-in compatible with the Claude memory tool API), ConversationManager (TinyDB chat history). All implement the same MemoryStore interface.

Prompts and evaluation

Hill-climbing PromptTuner for automatic prompt optimisation against labelled data. Four concrete tuners: classification, multi-class, extraction, judged-generation.
Benchmark runs one prompt across multiple clients (plain or agentic, mixed providers) and returns a comparison DataFrame. DeepEval metrics plug in as Scorers.

Async (optional)

aimu.aio mirrors the entire public surface — same class names, one import switches paradigms. The sync ladder is unchanged; async is strictly opt-in.
aio.Parallel and concurrent_tool_calls=True use asyncio.TaskGroup for structured concurrency: sibling cancellation on first failure, ExceptionGroup aggregation.
Same @tool-decorated functions work on both surfaces. async def tools are auto-detected and awaited; sync (CPU-bound) tools are routed through asyncio.to_thread so the event loop stays free.
Native async providers: Anthropic, OpenAI, Gemini, Ollama, every OpenAI-compatible endpoint. In-process providers (HuggingFace, LlamaCpp) wrap an existing sync client so model weights load only once.

Examples

import aimu

# One-shot
text = aimu.chat("Hello", model="anthropic:claude-sonnet-4-6")

# Multi-turn
client = aimu.client("ollama:qwen3.5:9b", system="You are concise.")
client.chat("Hi there")
client.chat("What did I just say?")     # history preserved

Default model. Omit model= and AIMU resolves one for you: it reads AIMU_LANGUAGE_MODEL ("provider:model_id"), else auto-selects an already-available local model (a running Ollama server, a cached HuggingFace model, or a local OpenAI-compatible server). A cloud provider is never auto-selected and weights are never downloaded implicitly.

reply = aimu.chat("Hello")                # uses AIMU_LANGUAGE_MODEL or a discovered local model
client = aimu.client(system="Be brief.")  # same resolution

Image, audio, and speech read AIMU_IMAGE_MODEL / AIMU_AUDIO_MODEL / AIMU_SPEECH_MODEL respectively.

Streaming with phase filtering. Drop unwanted phases (thinking, tool calls) with include=:

for chunk in client.chat("Tell me a story", stream=True, include=["generating"]):
    print(chunk.content, end="", flush=True)

Tools for models and agents. @tool works on any plain function:

from aimu.tools import tool

@tool
def letter_counter(word: str, letter: str) -> int:
    """Count occurrences of a letter in a word."""
    return word.lower().count(letter.lower())

agent = aimu.agent("ollama:qwen3.5:9b", tools=[letter_counter])
print(agent.run("How many r's in strawberry?"))

Code-controlled workflows. AIMU supports several workflow patterns: chaining, routing, parallelization, and evaluation loops. Chain.from_client(), for example, executes a series of LLM calls using a shared client and a list of per-step instructions:

from aimu.agents import Chain

chain = Chain.from_client(client, [
    "Break the task into clear steps.",
    "Execute each step using available tools.",
    "Polish the result into a single paragraph.",
])
result = chain.run("Research the top Python web frameworks.")

Vision input. Uniform across every vision-capable provider — on stateful chat() or stateless one-shot generate():

client = aimu.client("openai:gpt-4o-mini")
client.chat("What's in this image?", images=["./cat.jpg"])      # multi-turn, keeps history
client.generate("Caption this image.", images=["./cat.jpg"])    # one-shot, no history

Image generation. Same provider:model_id shape, parallel factory. Pass reference_image= for image-to-image:

# One-shot, local HuggingFace diffusers
path = aimu.generate_image(
    "a watercolor of a fox in a snowy forest",
    model="hf:runwayml/stable-diffusion-v1-5",
)

# Reuse loaded weights across calls
client = aimu.image_client("hf:stabilityai/stable-diffusion-xl-base-1.0")
img = client.generate("a cyberpunk city skyline at dusk")

# Image-to-image: steer generation from a reference image
img = client.generate("a cyberpunk version", reference_image="./photo.jpg", strength=0.7)

# FLUX.2 Klein — 4-step distilled, native img2img (no separate strength param)
client = aimu.image_client("hf:black-forest-labs/FLUX.2-klein-4B")
img = client.generate("a cat in a sunlit garden")
img = client.generate("add snow", reference_image="./cat.jpg")  # img2img

Curated models, no arbitrary repos. A provider:model_id string must name a model AIMU ships a spec for (the ids above are all curated). An unknown id raises rather than running with guessed capabilities — for a one-off custom model, build the provider spec and pass the object (aimu.image_client(HuggingFaceImageSpec(...))). This applies to every modality.

Negative prompts are accepted only by models whose spec sets supports_negative_prompt; prose models (FLUX.2 Klein, Nano Banana) raise if passed one — describe what to avoid in the prompt itself instead.

Audio generation. Same provider:model_id shape, parallel factory:

# One-shot — returns (sample_rate, np.ndarray) by default
sr, audio = aimu.generate_audio(
    "a lo-fi hip-hop beat with soft piano",
    model="hf:facebook/musicgen-small",
    duration_s=5.0,
)

# Save directly to WAV
path = aimu.generate_audio("ambient ocean waves", model="hf:facebook/musicgen-small", format="path")

Speech generation (TTS). Same provider:model_id shape:

# Save spoken WAV — OpenAI cloud (requires OPENAI_API_KEY)
path = aimu.generate_speech("Hello, world!", model="openai:tts-1")

# Local HuggingFace TTS — returns (sample_rate, np.ndarray)
sr, audio = aimu.generate_speech("Hello!", model="hf:facebook/mms-tts-eng", format="numpy")

# Reuse a client across calls — weights load once
client = aimu.speech_client("openai:tts-1-hd")
path = client.generate("Good morning.", voice="nova", format="path")

Vision and image generation together. A vision-capable agent with generate_image as a tool can perceive and create in the same run:

from aimu.agents import Agent
from aimu.tools import builtin

agent = Agent(aimu.client("anthropic:claude-sonnet-4-6"), tools=[builtin.generate_image])
agent.run("Describe the scene in this photo, then generate a watercolor painting of it.", images=["photo.jpg"])

Async (opt-in). Same names, one import away:

import asyncio
from aimu import aio

async def main():
    client = aio.client("anthropic:claude-sonnet-4-6")
    agent = aio.Agent(client, tools=[my_async_tool])
    reply = await agent.run("Hello")

    # asyncio.TaskGroup-backed Parallel — true coroutine concurrency
    parallel = aio.Parallel.from_client(client, worker_prompts=[...], aggregator_prompt="...")
    result = await parallel.run("topic")

asyncio.run(main())

Install

pip install aimu[all]

Or pick the providers you need: aimu[ollama], aimu[anthropic], aimu[openai_compat] (also enables OpenAI TTS speech), aimu[hf] (text + HuggingFace diffusers image + HuggingFace audio + HuggingFace TTS speech), aimu[google] (Nano Banana image generation), aimu[llamacpp]. See installation in the docs for the full list of extras.

Documentation


📘 Tutorials	Hand-held walkthroughs. Install to first agent in 15 mins
🛠️ How-to guides	Task-oriented recipes (switch providers, write a tool, stream output, benchmark models, ...)
📚 Reference	Auto-generated API docs, capability matrices, environment variables, CLI
💡 Explanation	The why: architecture, design principles, agents vs workflows

Notebooks

The notebooks/ directory ships interactive demos for every subsystem:

Notebook	Description
01 - Model Client	Text generation, chat, streaming, thinking models
02 - Vision	Image input via `images=` on `chat()` and one-shot `generate()`
03 - Tools	`@tool` decorator, built-in tool groups, MCPClient
04 - Prompt Management	Versioned prompt storage
05 - Prompt Tuning	Classification, multi-class, extraction, judged tuners
06 - Conversations	Persistent chat history
07 - Memory	Semantic fact storage and retrieval
08 - Agents	`Agent` and `agent.as_model_client()`
09 - Agent Skills	Filesystem-discovered skill injection
10 - Workflows	Chain, Router, Parallel, EvaluatorOptimizer, PlanExecuteEvaluator
11 - Prebuilt Agents	Orchestrator + worker tools pattern
12 - Evaluations	DeepEval integration
13 - Benchmarking	Multi-model comparison harness
14 - Async	`aimu.aio` surface end-to-end: chat, streaming, async tools, `asyncio.TaskGroup`-backed `Parallel`, async `MCPClient`, in-process provider wrapping
15 - Image Generation	`aimu.image_client()` / `aimu.generate_image()` with HuggingFace `diffusers` and Google Nano Banana, plus the built-in `generate_image` agent tool
16 - Audio Generation	`aimu.audio_client()` / `aimu.generate_audio()` with MusicGen, AudioLDM2, and Stable Audio Open, plus streaming and the built-in `generate_audio` agent tool
17 - Speech	TTS with HuggingFace (SpeechT5, MMS-TTS, BARK) and OpenAI (tts-1/tts-1-hd); `generate_speech` agent tool; Streamlit live narration; STT placeholder

Web apps

The web/ directory ships two Streamlit chat applications that demonstrate AIMU in action:

App	Description
streamlit_chatbot_basic.py	~70-line showcase — provider/model selector, streaming chat, built-in tools. Start here.
streamlit_chatbot.py	Full-featured — image/audio/speech generation, agentic mode, thinking display, generation sliders, live TTS narration. Extensible foundation.

streamlit run web/streamlit_chatbot.py         # full-featured Streamlit demo (agents, tools, images, audio, speech narration, etc.)
streamlit run web/streamlit_chatbot_basic.py   # basic Streamlit demo app
python web/gradio_chatbot_basic.py             # basic Gradio demo app

Design principles

AIMU is small and stays small. Six principles shape the API: plain Python, plain data (OpenAI message dicts only), composability through uniform interfaces, progressive disclosure, direct paths for common tasks, and apparent failures. The reasoning behind each, and the patterns each one excludes, lives on the design principles page.

Contributing

See the contributing guide for dev setup, testing, lint, and PR conventions.

License

Apache 2.0.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.6.0

Jun 4, 2026

This version

0.5.1

Jun 1, 2026

0.5.0

Jun 1, 2026

0.4.0

May 26, 2026

0.3.2

May 5, 2026

0.3.1

May 4, 2026

0.3.0

Apr 27, 2026

0.2.0

Apr 10, 2026

0.1.6

Sep 25, 2025

0.1.5

Aug 8, 2025

0.1.4

Jul 21, 2025

0.1.3

Jun 21, 2025

0.1.2

Jun 20, 2025

0.1.1

Jun 20, 2025

0.1.0

Jun 20, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aimu-0.5.1.tar.gz (256.4 kB view details)

Uploaded Jun 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

aimu-0.5.1-py3-none-any.whl (227.3 kB view details)

Uploaded Jun 1, 2026 Python 3

File details

Details for the file aimu-0.5.1.tar.gz.

File metadata

Download URL: aimu-0.5.1.tar.gz
Upload date: Jun 1, 2026
Size: 256.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.27 {"installer":{"name":"uv","version":"0.9.27","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for aimu-0.5.1.tar.gz
Algorithm	Hash digest
SHA256	`cb6df1cbe8724a1866d2bafd8c05c1cb77ca521cf26d425e67f018db216c0cad`
MD5	`9769d83ecca19d0cb7a383c37b980943`
BLAKE2b-256	`c9c81f2f91347554c498e303d5697cfacd19c82be3f424dcd2a82238755372dd`

See more details on using hashes here.

File details

Details for the file aimu-0.5.1-py3-none-any.whl.

File metadata

Download URL: aimu-0.5.1-py3-none-any.whl
Upload date: Jun 1, 2026
Size: 227.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.27 {"installer":{"name":"uv","version":"0.9.27","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for aimu-0.5.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2aea977d140b89cd20c2fabac8def5f641037790f022e7da40c599c950436f87`
MD5	`ebda708ef4b84789fb766d83d0e79f1d`
BLAKE2b-256	`1a729112fcac5ce81558796fbc97d8029871bb9813ea840f50860b945a4c39db`

See more details on using hashes here.

aimu 0.5.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Key features

Language models

Image and audio generation

Speech

Agents and workflows

Tools

Memory and persistence

Prompts and evaluation

Async (optional)

Examples

Install

Documentation

Notebooks

Web apps

Design principles

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes