Unified AI agent framework: one interface for 9 LLM providers (Qwen, Kimi, GLM, DeepSeek, MiniMax, Doubao, ChatGPT, Gemini, Claude) with tools, MCP, sub-agents, skills, RAG knowledge base, scheduler and multi-user serving

These details have not been verified by PyPI

Project links

Project description

🦌 milu

Production-ready multi-user AI agents — with Chinese LLMs as first-class citizens.

Multi-user agent pool · One interface for 9 LLM providers (Chinese-first) · Built-in tools & MCP · Sub-agents · Skills · RAG · Scheduler

English | 简体中文

Why milu?

Most agent frameworks stop at single-user demos, and treat Chinese LLM providers as an afterthought. milu starts where they stop:

🏭 From demo to production in one library
AgentPool gives you per-user agent isolation, LRU/TTL eviction, global concurrency limits and shared MCP processes. The question every framework leaves as "an exercise for the reader" — "my demo works, how do I serve 100 concurrent users without sessions bleeding into each other?" — is answered here, backed by 1100+ tests. The same pool maps tenants to their own API keys (KeyedLLMProvider), so it scales from a side project to multi-tenant SaaS.
🇨🇳 Chinese LLMs as first-class citizens
Qwen, DeepSeek, Kimi, GLM, MiniMax, Doubao natively supported alongside OpenAI, Gemini and Claude. No base_url juggling, provider quirks (thinking mode, built-in web search, parameter differences) pre-adapted, plus a China-reachable search backend out of the box.
🔋 Batteries actually included
20+ built-in tools (files, shell, Python, web fetch/search, Office/PDF reading, vision input), MCP protocol (stdio/HTTP/SSE), sub-agents, skills, session persistence, automatic context compaction, long-term memory, RAG knowledge base, scheduled tasks, and a built-in multi-user web service.
🛡️ A real safety model
Four operation modes (talk / manual / auto / superwork), an AI safety judge for unsafe tool calls (Claude-Code-style), human confirmation flows, and delegation that never bypasses approval.
🪶 Thin by design
Built directly on the openai SDK as the unified HTTP client. Events stream out as plain dataclasses. No chains, no graphs, no DSL to learn.

Two ways to use it

milu is both a ready-to-run agent and a framework to build on — start instantly, embed when you need to:

🚀 Run it
milu for chat, milu serve for a multi-user service — full capabilities, zero code. Both the CLI and the web UI ship in English and 中文.
🧩 Build on it
from milu import Agent to embed agents in your own backend, then scale to multi-user / multi-tenant with AgentPool — you own your data and stack.

Install

[!TIP] One pip install milu gets everything — CLI, web service, RAG knowledge base and MCP are all included. You only need at least one provider API key to start.

With pip — if you already have Python 3.10+:

pip install milu              # everything included: CLI, web service, RAG, MCP

New to Python? Beginner step-by-step

Download Python 3.10+ from python.org/downloads. On Windows, tick "Add Python to PATH" during setup.
Open a terminal (Windows: PowerShell · macOS: Terminal) and check: python --version should print 3.10 or higher.
pip install milu
milu to start chatting.

No existing Python? — the easiest one-liner. uv installs Python and milu for you:

# 1. install uv (one line, needs no Python)
curl -LsSf https://astral.sh/uv/install.sh | sh            # macOS / Linux
powershell -c "irm https://astral.sh/uv/install.ps1 | iex" # Windows

# 2. install milu (uv fetches a Python automatically if missing)
uv tool install milu

Docker — no Python on the host at all:

cp .env.example .env          # fill in at least one provider API key
docker compose up -d

Quick start

[!NOTE] First run launches an interactive setup wizard — pick a provider, paste an API key, and you're chatting. Zero config to first conversation.

CLI — zero config to first conversation:

milu                # first run guides you through provider + API key setup

Code — a full-featured agent in 3 lines:

from milu import Agent, ModelRegistry

agent = Agent(ModelRegistry.create("deepseek", model="deepseek-v4-flash"))
async for event in agent.run("What time is it? Use a tool to check."):
    ...

Agent(llm) is the complete package by default: built-in system prompt, 20+ tools, skills, three sub-agents, session persistence and context compaction — pass explicit arguments only to override.

Multi-user web service — one command:

milu serve          # multi-user chat + full-featured demo UI at http://127.0.0.1:8000

How it compares

Capability	milu	LangChain	CrewAI	smolagents	Qwen-Agent
Chinese providers native (6)	✅	community pkgs	via LiteLLM	via LiteLLM	Qwen family
Multi-user pool, in-library	✅ AgentPool	platform (paid)	platform	—	—
MCP protocol	✅ 3 transports	✅	✅	✅	✅
Built-in tools (files/docs/vision/search)	✅ 20+	install per-integration	partial	minimal	partial
Tool-safety modes + AI judge	✅	—	—	sandbox only	—
RAG knowledge base, in-library	✅	assemble yourself	partial	—	✅
Scheduled tasks (multi-user)	✅	—	—	—	—
CLI + web service out of the box	✅	—	partial	—	demo UI

✅ = built-in; "—" = not built-in (often available via an external platform or a few lines of your own code). Reflects each library as of June 2026 — these move fast, so corrections are welcome via issue/PR.

When milu is the right fit: you're building on Chinese LLMs, you need a production multi-user / multi-tenant service (not just a single-user demo), and you want batteries included — runnable as-is or embeddable as a library, Or as a strong, bold and flexible development core and intelligent base.

When to choose something else: for the largest integration ecosystem, LangChain; for pure multi-agent orchestration, CrewAI or AutoGen; for a tiny, barebones core with almost nothing built in, smolagents.

What you can build

Personal AI assistant
milu drops you into a chat in one command; long-term memory remembers your preferences, scheduled tasks handle reminders and daily digests, and built-in tools (web search, files, docs, vision) are ready to use — all running locally, your data stays yours.
Enterprise knowledge assistant
Load manuals / FAQs / policies into the RAG knowledge base; auto-retrieval each turn, source-aware answers that separate "internal docs vs web", no hallucinated guesses. Per-user isolated sessions and memory.
Customer-support / ticket bot
High-volume repetitive queries and ticket triage; AgentPool handles many concurrent users, safety modes gate what actions run.
Vertical / industry assistant
Sub-agents + document & vision reading + MCP to plug into your own systems and databases, bringing domain knowledge and real data in.
An "AI coworker" for your team
Pull tasks from chat, nudge progress on a schedule, auto-generate recap summaries (scheduled tasks + multi-user + tools).
Private / on-prem deployment
docker compose up -d; runs entirely in your environment with Chinese (or any) LLMs, data never leaves.
Multi-tenant SaaS / a base for AI app vendors
KeyedLLMProvider maps tenants to their own API keys; the pool enforces per-user instance and concurrency isolation — scale from a side project to a multi-tenant product.

Examples

1 · Call any LLM directly (streaming)

import asyncio
from milu import ModelRegistry, Message, MessageRole

async def main():
    llm = ModelRegistry.create("qwen", model="qwen3.6-plus")
    async for chunk in llm.chat([Message(role=MessageRole.USER, content="Hello!")]):
        if chunk.content:
            print(chunk.content, end="", flush=True)

asyncio.run(main())

Swap "qwen" for "deepseek", "kimi", "glm", "minimax", "doubao", "openai", "gemini" or "anthropic" — same interface, API keys read from {PROVIDER}_API_KEY environment variables.

2 · Agent with tools and events

import asyncio
from milu import Agent, ModelRegistry, AgentDone, TextDelta

async def main():
    agent = Agent(ModelRegistry.create("deepseek", model="deepseek-v4-flash"))
    async for evt in agent.run("Summarize the contents of ./report.pdf"):
        if isinstance(evt, TextDelta):
            print(evt.text, end="", flush=True)
        elif isinstance(evt, AgentDone):
            print(f"\n[done in {evt.turn_count} turns]")

asyncio.run(main())

The agent streams typed events — text deltas, reasoning, tool calls, confirmations, sub-agent progress — consume what you need, ignore the rest.

3 · Custom tools

from milu import Agent, tool

@tool(name="add", description="Add two numbers", is_safe=True)
async def add(a: int, b: int) -> int:
    """:param a: first number\n:param b: second number"""
    return a + b

agent = Agent(llm, tools=[add])        # explicit list replaces built-ins

is_safe=False routes the call through the active safety mode: auto-judged by AI, confirmed by a human, or blocked — depending on the mode.

4 · Safety modes

agent = Agent(llm, mode="manual")   # unsafe tools wait for human approval
agent.set_mode("talk")              # read-only: unsafe tools blocked

Mode	Behavior
`talk`	read-only — every unsafe tool call is blocked
`manual`	safe tools run; unsafe tools emit a confirmation event and wait
`auto` (default)	autonomous; unsafe calls are screened by an AI safety judge (allow / confirm / deny)
`superwork`	full permissions, no checks

[!WARNING] superwork skips every safety check (including the AI judge). Use it only for fully trusted tasks.

Sub-agents inherit the parent's mode and confirmation callback — delegation is never a bypass.

5 · Long-term memory & RAG knowledge base

agent = Agent(llm, memory="user-42", knowledge="user-42")

Memory: small set of durable facts, rendered into the system prompt every turn, survives across sessions and processes.
Knowledge: chunked + embedded documents (pdf/docx/xlsx/pptx/md/txt) with cosine retrieval, source-catalog routing in the prompt, optional per-turn auto-retrieval, and kb_search / kb_ingest / kb_manage tools. Per-user isolated storage.

6 · Multi-user concurrency (AgentPool)

from milu import AgentPool, ModelRegistry

llm = ModelRegistry.create("qwen", model="qwen3.6-plus")   # coroutine-safe, shareable
pool = AgentPool.from_llm(llm)
await pool.start()

async with pool.acquire("user-1", "session-A") as h:
    async for evt in h.agent.run("Hello!"):
        ...

await pool.stop()

Four hard invariants: ≤1 agent per (user, session) · bounded instance count · bounded concurrent runs · idle agents evicted. Sessions, memory and knowledge are derived per-user automatically.

7 · MCP servers

// config/mcp_servers.json
{
  "mcpServers": {
    "playwright": { "command": "npx", "args": ["@playwright/mcp@latest"] },
    "my-http":    { "type": "streamable_http", "url": "http://localhost:3000/mcp" }
  }
}

stdio / streamable HTTP / SSE transports, parallel connection with error isolation, and a dormant-pool design: MCP tool schemas don't bloat the context — the agent discovers and activates them on demand. For high-concurrency deployments, one shared set of MCP processes can serve the entire pool.

8 · Scheduled tasks

milu chat
> Remind me every weekday at 9am to summarize yesterday's AI news   # agent creates the task

Cron-style scheduling per user, executed inside milu chat / milu serve (or a standalone milu scheduler start daemon) with a single-instance lock and automatic takeover. Results are delivered to an outbox file, server push, or desktop notification.

CLI

milu                 interactive chat (first run launches setup wizard)
milu setup           provider / API key / search backend wizard
milu chat -p glm     chat with a specific provider
milu run "..." -q    one-shot execution, pipe-friendly
milu serve           multi-user web service + demo UI
milu providers       list 9 providers and key status
milu config ...      layered config (CLI > user > project > defaults)
milu sessions list   browse saved sessions
milu schedule ...    manage scheduled tasks
milu --lang en ...   switch UI language for one run (zh / en)

Language (中文 / English). Both the CLI and the web UI are fully bilingual. Pick the interface language in any of these ways:

milu --lang en providers        # one-off override (also accepts --lang zh)
$env:MILU_LANG="en"; milu chat   # per-session via env var (PowerShell; bash: MILU_LANG=en)
milu config set lang en          # persist to ~/.milu/config.json
milu setup                       # the wizard asks for language as its first step

Priority: --lang > MILU_LANG > config.json lang > default zh. In the web UI, use the EN / 中文 toggle in the top bar.

Architecture

Text version

AgentPool (multi-user, optional)
  └─ Agent.run() loop ── system prompt rebuild → auto-compaction
       ├─ LLM layer        9 providers, one AsyncOpenAI-based interface
       ├─ Tool layer       built-ins · custom @tool · MCP (active/dormant pools)
       ├─ Safety layer     modes · AI judge · confirmation flow
       ├─ Sub-agents       researcher / reader / coder (isolated context)
       ├─ Prompts & skills layered markdown prompts · on-demand skill loading
       └─ Session          JSONL persistence · compaction snapshots

Python 3.10+ · fully async · every provider speaks through one openai.AsyncOpenAI client, so LLM instances are coroutine-safe and shareable across users.

Production notes

Scaling out: route by user_id (e.g. nginx ip_hash); per-session serialization is handled by in-process entry locks — no distributed locks needed. Sessions persist to disk and recover after eviction or restart.
Memory budget: MCP subprocesses are the dominant cost (15–50 MB per agent). Enable shared MCP (AgentPoolConfig(shared_mcp=True)) to keep one set of MCP processes for the whole pool.
Multi-tenant keys: KeyedLLMProvider caches one LLM client per distinct API key with LRU eviction — see examples/multi_tenant_keys.py.
Docker: see docs/Docker部署.md — health checks, data volumes, SSE reverse-proxy settings, scheduler single-instance behavior.

Roadmap

Observability: OpenTelemetry tracing hooks
Sandboxed code execution backends for python_repl / shell_command
Pluggable ANN backends for the knowledge store (sqlite-vec) beyond brute-force cosine
English documentation set (architecture & guides — currently Chinese)
Prebuilt images on a container registry
One-click installers / standalone binaries (no Python required)

Contributing

Issues and PRs welcome. Run the test suite with:

pip install -e ".[dev]"
python -m pytest tests/ --ignore=tests/test_real_api.py --ignore=tests/test_real_new_providers.py -q

License

MIT. Five built-in skills are ported from anthropics/skills (Apache-2.0) and obra/superpowers (MIT) — see THIRD_PARTY_NOTICES.

milu (麋鹿) — named after Père David's deer, the legendary Chinese animal that "resembles four creatures yet is none of them" — one body, the strengths of many.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.1

Jun 9, 2026

This version

0.1.0

Jun 9, 2026

0.0.1

Jun 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

milu-0.1.0.tar.gz (543.8 kB view details)

Uploaded Jun 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

milu-0.1.0-py3-none-any.whl (418.2 kB view details)

Uploaded Jun 9, 2026 Python 3

File details

Details for the file milu-0.1.0.tar.gz.

File metadata

Download URL: milu-0.1.0.tar.gz
Upload date: Jun 9, 2026
Size: 543.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for milu-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`0e0330b8c3f3ef80cef45fcd177286993bc21eaac4426f3311f04efc82e71fd9`
MD5	`268b4dc45d7968b12ad92cadab617a6e`
BLAKE2b-256	`51835106bbbd5fce6c71b3ccdef94ad9ec567f10b1b3ade28e0ed56c88a18708`

See more details on using hashes here.

File details

Details for the file milu-0.1.0-py3-none-any.whl.

File metadata

Download URL: milu-0.1.0-py3-none-any.whl
Upload date: Jun 9, 2026
Size: 418.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for milu-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`671c608bf8a5cd8a1e85e307d10e9681dc0adce7b07f352a626f21925b2e1701`
MD5	`848390c238dfcf3971f2d2f677ff1678`
BLAKE2b-256	`8853dfc0dda5beb3e9e52df047fd3df2e88b392dbf18c5f58d7cf95f936c3f30`

See more details on using hashes here.

milu 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🦌 milu

Why milu?

Two ways to use it

Install

Quick start

How it compares

What you can build

Examples

CLI

Architecture

Production notes

Roadmap

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes