Unified AI agent framework: one interface for 9 LLM providers (Qwen, Kimi, GLM, DeepSeek, MiniMax, Doubao, ChatGPT, Gemini, Claude) with tools, MCP, sub-agents, skills, RAG knowledge base, scheduler and multi-user serving
Project description
🦌 milu
Production-ready multi-user AI agents — with Chinese LLMs as first-class citizens.
Multi-user agent pool · One interface for 9 LLM providers (Chinese-first) · Built-in tools & MCP · Sub-agents · Skills · RAG · Scheduler
English | 简体中文
Why milu?
Most agent frameworks stop at single-user demos, and treat Chinese LLM providers as an afterthought. milu starts where they stop:
- 🏭 From demo to production in one library
AgentPoolgives you per-user agent isolation, LRU/TTL eviction, global concurrency limits and shared MCP processes. The question every framework leaves as "an exercise for the reader" — "my demo works, how do I serve 100 concurrent users without sessions bleeding into each other?" — is answered here, backed by 1100+ tests. The same pool maps tenants to their own API keys (KeyedLLMProvider), so it scales from a side project to multi-tenant SaaS. - 🇨🇳 Chinese LLMs as first-class citizens
Qwen, DeepSeek, Kimi, GLM, MiniMax, Doubao natively supported alongside OpenAI, Gemini and Claude. Nobase_urljuggling, provider quirks (thinking mode, built-in web search, parameter differences) pre-adapted, plus a China-reachable search backend out of the box. - 🔋 Batteries actually included
20+ built-in tools (files, shell, Python, web fetch/search, Office/PDF reading, vision input), MCP protocol (stdio/HTTP/SSE), sub-agents, skills, session persistence, automatic context compaction, long-term memory, RAG knowledge base, scheduled tasks, and a built-in multi-user web service. - 🛡️ A real safety model
Four operation modes (talk/manual/auto/superwork), an AI safety judge for unsafe tool calls (Claude-Code-style), human confirmation flows, and delegation that never bypasses approval. - 🪶 Thin by design
Built directly on theopenaiSDK as the unified HTTP client. Events stream out as plain dataclasses. No chains, no graphs, no DSL to learn.
Two ways to use it
milu is both a ready-to-run agent and a framework to build on — start instantly, embed when you need to:
- 🚀 Run it
milufor chat,milu servefor a multi-user service — full capabilities, zero code. Both the CLI and the web UI ship in English and 中文. - 🧩 Build on it
from milu import Agentto embed agents in your own backend, then scale to multi-user / multi-tenant with AgentPool — you own your data and stack.
Install
[!TIP] One
pip install milugets everything — CLI, web service, RAG knowledge base and MCP are all included. You only need at least one provider API key to start.
With pip — if you already have Python 3.10+:
pip install milu # everything included: CLI, web service, RAG, MCP
New to Python? Beginner step-by-step
- Download Python 3.10+ from python.org/downloads. On Windows, tick "Add Python to PATH" during setup.
- Open a terminal (Windows: PowerShell · macOS: Terminal) and check:
python --versionshould print 3.10 or higher. pip install milumiluto start chatting.
No existing Python? — the easiest one-liner. uv installs Python and milu for you:
# 1. install uv (one line, needs no Python)
curl -LsSf https://astral.sh/uv/install.sh | sh # macOS / Linux
powershell -c "irm https://astral.sh/uv/install.ps1 | iex" # Windows
# 2. install milu (uv fetches a Python automatically if missing)
uv tool install milu
Docker — no Python on the host at all:
cp .env.example .env # fill in at least one provider API key
docker compose up -d
Quick start
[!NOTE] First run launches an interactive setup wizard — pick a provider, paste an API key, and you're chatting. Zero config to first conversation.
CLI — zero config to first conversation:
milu # first run guides you through provider + API key setup
Code — a full-featured agent in 3 lines:
from milu import Agent, ModelRegistry
agent = Agent(ModelRegistry.create("deepseek", model="deepseek-v4-flash"))
async for event in agent.run("What time is it? Use a tool to check."):
...
Agent(llm) is the complete package by default: built-in system prompt, 20+ tools, skills, three sub-agents, session persistence and context compaction — pass explicit arguments only to override.
Multi-user web service — one command:
milu serve # multi-user chat + full-featured demo UI at http://127.0.0.1:8000
How it compares
| Capability | milu | LangChain | CrewAI | smolagents | Qwen-Agent |
|---|---|---|---|---|---|
| Chinese providers native (6) | ✅ | community pkgs | via LiteLLM | via LiteLLM | Qwen family |
| Multi-user pool, in-library | ✅ AgentPool | platform (paid) | platform | — | — |
| MCP protocol | ✅ 3 transports | ✅ | ✅ | ✅ | ✅ |
| Built-in tools (files/docs/vision/search) | ✅ 20+ | install per-integration | partial | minimal | partial |
| Tool-safety modes + AI judge | ✅ | — | — | sandbox only | — |
| RAG knowledge base, in-library | ✅ | assemble yourself | partial | — | ✅ |
| Scheduled tasks (multi-user) | ✅ | — | — | — | — |
| CLI + web service out of the box | ✅ | — | partial | — | demo UI |
✅ = built-in; "—" = not built-in (often available via an external platform or a few lines of your own code). Reflects each library as of June 2026 — these move fast, so corrections are welcome via issue/PR.
When milu is the right fit: you're building on Chinese LLMs, you need a production multi-user / multi-tenant service (not just a single-user demo), and you want batteries included — runnable as-is or embeddable as a library, Or as a strong, bold and flexible development core and intelligent base.
When to choose something else: for the largest integration ecosystem, LangChain; for pure multi-agent orchestration, CrewAI or AutoGen; for a tiny, barebones core with almost nothing built in, smolagents.
What you can build
- Personal AI assistant
miludrops you into a chat in one command; long-term memory remembers your preferences, scheduled tasks handle reminders and daily digests, and built-in tools (web search, files, docs, vision) are ready to use — all running locally, your data stays yours. - Enterprise knowledge assistant
Load manuals / FAQs / policies into the RAG knowledge base; auto-retrieval each turn, source-aware answers that separate "internal docs vs web", no hallucinated guesses. Per-user isolated sessions and memory. - Customer-support / ticket bot
High-volume repetitive queries and ticket triage; AgentPool handles many concurrent users, safety modes gate what actions run. - Vertical / industry assistant
Sub-agents + document & vision reading + MCP to plug into your own systems and databases, bringing domain knowledge and real data in. - An "AI coworker" for your team
Pull tasks from chat, nudge progress on a schedule, auto-generate recap summaries (scheduled tasks + multi-user + tools). - Private / on-prem deployment
docker compose up -d; runs entirely in your environment with Chinese (or any) LLMs, data never leaves. - Multi-tenant SaaS / a base for AI app vendors
KeyedLLMProvidermaps tenants to their own API keys; the pool enforces per-user instance and concurrency isolation — scale from a side project to a multi-tenant product.
Examples
1 · Call any LLM directly (streaming)
import asyncio
from milu import ModelRegistry, Message, MessageRole
async def main():
llm = ModelRegistry.create("qwen", model="qwen3.6-plus")
async for chunk in llm.chat([Message(role=MessageRole.USER, content="Hello!")]):
if chunk.content:
print(chunk.content, end="", flush=True)
asyncio.run(main())
Swap "qwen" for "deepseek", "kimi", "glm", "minimax", "doubao", "openai", "gemini" or "anthropic" — same interface, API keys read from {PROVIDER}_API_KEY environment variables.
2 · Agent with tools and events
import asyncio
from milu import Agent, ModelRegistry, AgentDone, TextDelta
async def main():
agent = Agent(ModelRegistry.create("deepseek", model="deepseek-v4-flash"))
async for evt in agent.run("Summarize the contents of ./report.pdf"):
if isinstance(evt, TextDelta):
print(evt.text, end="", flush=True)
elif isinstance(evt, AgentDone):
print(f"\n[done in {evt.turn_count} turns]")
asyncio.run(main())
The agent streams typed events — text deltas, reasoning, tool calls, confirmations, sub-agent progress — consume what you need, ignore the rest.
3 · Custom tools
from milu import Agent, tool
@tool(name="add", description="Add two numbers", is_safe=True)
async def add(a: int, b: int) -> int:
""":param a: first number\n:param b: second number"""
return a + b
agent = Agent(llm, tools=[add]) # explicit list replaces built-ins
is_safe=False routes the call through the active safety mode: auto-judged by AI, confirmed by a human, or blocked — depending on the mode.
4 · Safety modes
agent = Agent(llm, mode="manual") # unsafe tools wait for human approval
agent.set_mode("talk") # read-only: unsafe tools blocked
| Mode | Behavior |
|---|---|
talk |
read-only — every unsafe tool call is blocked |
manual |
safe tools run; unsafe tools emit a confirmation event and wait |
auto (default) |
autonomous; unsafe calls are screened by an AI safety judge (allow / confirm / deny) |
superwork |
full permissions, no checks |
[!WARNING]
superworkskips every safety check (including the AI judge). Use it only for fully trusted tasks.
Sub-agents inherit the parent's mode and confirmation callback — delegation is never a bypass.
5 · Long-term memory & RAG knowledge base
agent = Agent(llm, memory="user-42", knowledge="user-42")
- Memory: small set of durable facts, rendered into the system prompt every turn, survives across sessions and processes.
- Knowledge: chunked + embedded documents (pdf/docx/xlsx/pptx/md/txt) with cosine retrieval, source-catalog routing in the prompt, optional per-turn auto-retrieval, and
kb_search/kb_ingest/kb_managetools. Per-user isolated storage.
6 · Multi-user concurrency (AgentPool)
from milu import AgentPool, ModelRegistry
llm = ModelRegistry.create("qwen", model="qwen3.6-plus") # coroutine-safe, shareable
pool = AgentPool.from_llm(llm)
await pool.start()
async with pool.acquire("user-1", "session-A") as h:
async for evt in h.agent.run("Hello!"):
...
await pool.stop()
Four hard invariants: ≤1 agent per (user, session) · bounded instance count · bounded concurrent runs · idle agents evicted. Sessions, memory and knowledge are derived per-user automatically.
7 · MCP servers
// config/mcp_servers.json
{
"mcpServers": {
"playwright": { "command": "npx", "args": ["@playwright/mcp@latest"] },
"my-http": { "type": "streamable_http", "url": "http://localhost:3000/mcp" }
}
}
stdio / streamable HTTP / SSE transports, parallel connection with error isolation, and a dormant-pool design: MCP tool schemas don't bloat the context — the agent discovers and activates them on demand. For high-concurrency deployments, one shared set of MCP processes can serve the entire pool.
8 · Scheduled tasks
milu chat
> Remind me every weekday at 9am to summarize yesterday's AI news # agent creates the task
Cron-style scheduling per user, executed inside milu chat / milu serve (or a standalone milu scheduler start daemon) with a single-instance lock and automatic takeover. Results are delivered to an outbox file, server push, or desktop notification.
CLI
milu interactive chat (first run launches setup wizard)
milu setup provider / API key / search backend wizard
milu chat -p glm chat with a specific provider
milu run "..." -q one-shot execution, pipe-friendly
milu serve multi-user web service + demo UI
milu providers list 9 providers and key status
milu config ... layered config (CLI > user > project > defaults)
milu sessions list browse saved sessions
milu schedule ... manage scheduled tasks
milu --lang en ... switch UI language for one run (zh / en)
Language (中文 / English). Both the CLI and the web UI are fully bilingual. Pick the interface language in any of these ways:
milu --lang en providers # one-off override (also accepts --lang zh)
$env:MILU_LANG="en"; milu chat # per-session via env var (PowerShell; bash: MILU_LANG=en)
milu config set lang en # persist to ~/.milu/config.json
milu setup # the wizard asks for language as its first step
Priority: --lang > MILU_LANG > config.json lang > default zh. In the web UI, use the EN / 中文 toggle in the top bar.
Architecture
Text version
AgentPool (multi-user, optional)
└─ Agent.run() loop ── system prompt rebuild → auto-compaction
├─ LLM layer 9 providers, one AsyncOpenAI-based interface
├─ Tool layer built-ins · custom @tool · MCP (active/dormant pools)
├─ Safety layer modes · AI judge · confirmation flow
├─ Sub-agents researcher / reader / coder (isolated context)
├─ Prompts & skills layered markdown prompts · on-demand skill loading
└─ Session JSONL persistence · compaction snapshots
Python 3.10+ · fully async · every provider speaks through one openai.AsyncOpenAI client, so LLM instances are coroutine-safe and shareable across users.
Production notes
- Scaling out: route by
user_id(e.g. nginxip_hash); per-session serialization is handled by in-process entry locks — no distributed locks needed. Sessions persist to disk and recover after eviction or restart. - Memory budget: MCP subprocesses are the dominant cost (15–50 MB per agent). Enable shared MCP (
AgentPoolConfig(shared_mcp=True)) to keep one set of MCP processes for the whole pool. - Multi-tenant keys:
KeyedLLMProvidercaches one LLM client per distinct API key with LRU eviction — seeexamples/multi_tenant_keys.py. - Docker: see docs/Docker部署.md — health checks, data volumes, SSE reverse-proxy settings, scheduler single-instance behavior.
Roadmap
- Observability: OpenTelemetry tracing hooks
- Sandboxed code execution backends for
python_repl/shell_command - Pluggable ANN backends for the knowledge store (sqlite-vec) beyond brute-force cosine
- English documentation set (architecture & guides — currently Chinese)
- Prebuilt images on a container registry
- One-click installers / standalone binaries (no Python required)
Contributing
Issues and PRs welcome. Run the test suite with:
pip install -e ".[dev]"
python -m pytest tests/ --ignore=tests/test_real_api.py --ignore=tests/test_real_new_providers.py -q
License
MIT. Five built-in skills are ported from anthropics/skills (Apache-2.0) and obra/superpowers (MIT) — see THIRD_PARTY_NOTICES.
milu (麋鹿) — named after Père David's deer, the legendary Chinese animal that "resembles four creatures yet is none of them" — one body, the strengths of many.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file milu-0.1.0.tar.gz.
File metadata
- Download URL: milu-0.1.0.tar.gz
- Upload date:
- Size: 543.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0e0330b8c3f3ef80cef45fcd177286993bc21eaac4426f3311f04efc82e71fd9
|
|
| MD5 |
268b4dc45d7968b12ad92cadab617a6e
|
|
| BLAKE2b-256 |
51835106bbbd5fce6c71b3ccdef94ad9ec567f10b1b3ade28e0ed56c88a18708
|
File details
Details for the file milu-0.1.0-py3-none-any.whl.
File metadata
- Download URL: milu-0.1.0-py3-none-any.whl
- Upload date:
- Size: 418.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
671c608bf8a5cd8a1e85e307d10e9681dc0adce7b07f352a626f21925b2e1701
|
|
| MD5 |
848390c238dfcf3971f2d2f677ff1678
|
|
| BLAKE2b-256 |
8853dfc0dda5beb3e9e52df047fd3df2e88b392dbf18c5f58d7cf95f936c3f30
|