Production-grade Agent framework with sub-second latency via tiered execution: three-level cache, dual-model routing, and scenario toolchains.
Project description
SwiftAgentX
A production Agent framework built around Scenarios — pre-compiled execution paths that skip the ReAct loop entirely on known intents.
The core idea: Scenarios
Other frameworks treat every request as an open-ended reasoning problem. SwiftAgentX disagrees. In production, 80% of traffic is predictable: "check my order status", "what's your return policy", "book a slot at 3pm". For these, a ReAct loop is overkill — three to five LLM calls, several seconds of latency, a token bill that nobody can explain.
A Scenario is a pre-compiled execution path:
agent.register_scenario("order_status", ScenarioConfig(
name="Order Status",
triggers=["order", "where is my", "shipment"],
tool_chain=[
ToolChainStep(tool="order_db", query_template="$order_id"),
ToolChainStep(tool="courier_api", condition="status=in_transit"),
],
cache_ttl=120,
output_type="direct", # no second LLM call to "format" the answer
))
When the LIGHT model classifies a request as a weather / order_status /
balance_check scenario, SwiftAgentX executes the chain directly —
no ReAct loop, no second LLM call. One classification step (LIGHT model,
~200 ms), one tool chain, done.
This is the framework's biggest design bet, and the place it pulls ahead of LangChain / AutoGen / CrewAI by a margin that actually matters in production.
Tiered execution
Scenarios sit in the middle of a four-tier execution model. All numbers
below are measured against DashScope Qwen — 30 iterations per scenario,
LIGHT=qwen-flash, HEAVY=qwen-turbo. Reproducible from this repo with
one command (see benchmarks/).
| Request type | Path | P50 latency | P95 | LLM calls |
|---|---|---|---|---|
| KB exact match / cache hit | Pipeline short-circuit | 0 ms | 0 ms | 0 |
| Known intent (Scenario) | Pre-compiled tool chain | 517 ms | 802 ms | 1 (LIGHT only) |
| Open conversation | Direct LLM | 1.4 s | 2.4 s | 2 (LIGHT + HEAVY) |
| Multi-step reasoning | Full ReAct loop | 3.1 s | 4.0 s | 3 |
A LIGHT model picks the path. A HEAVY model only runs when the request genuinely needs open-ended reasoning. The two cheap tiers (cache + scenario) together cover the predictable bulk of production traffic at 0–1 LLM calls per request — that's the headline. Reproduce the numbers with:
git clone https://github.com/Caxson/swiftagentx.git
cd swiftagentx
pip install -e ".[dev,openai,benchmark]"
export DASHSCOPE_API_KEY=sk-...
python benchmarks/real_runner.py --iterations 30
What goes inside a Scenario
A Scenario is not just a static tool list. Steps in a chain can be:
- A native Python
Tool - (v0.3+) An MCP tool — any Model Context Protocol server's exposed tools, no Python wrapper required
- (v0.3+) A hook — a conditional handler that branches into an LLM call, a sub-agent dispatch, or external shell logic when the chain hits a particular state
This is how Scenarios stay fast and extensible: the routing decision is cheap, but each step can reach into the full agent toolkit when needed.
vs. LangChain / AutoGen / CrewAI
| SwiftAgentX | LangChain | AutoGen | CrewAI | |
|---|---|---|---|---|
| Pre-compiled Scenario shortcut | ✅ core differentiator | ❌ no equivalent | ❌ no equivalent | ❌ no equivalent |
| FAQ / cache-hit returns with 0 LLM calls | ✅ | 1-3 LLM calls | 2+ LLM calls | 2+ LLM calls |
| Built-in three-level cache (KB / tool / session) | ✅ | partial | ❌ | ❌ |
| Dual-model routing (LIGHT/HEAVY) baked in | ✅ | DIY | DIY | DIY |
| Pipeline stage short-circuit (KB / security / feature flags) | ✅ | DIY | ❌ | ❌ |
| Streaming with fine-grained event types | ✅ 12 types | ✅ | partial | ✅ |
Framework-agnostic core (no HTTP in core/) |
✅ | n/a | n/a | n/a |
| Test suite size | 195 tests, < 0.5 s | huge | huge | medium |
LangChain is broader. SwiftAgentX is sharper for the predictable-traffic production patterns where latency and per-request LLM cost actually move the needle.
Who is this for
- You ship an Agent product where most requests are predictable (customer service, order ops, FAQ, internal copilots, AI outbound) and only a small tail needs real open-ended reasoning.
- You care about P95 latency and per-request LLM cost as first-class metrics, not afterthoughts.
- You want a framework you can read in one afternoon (4k lines of source) and modify without fear.
- You're comfortable wiring tools, KBs, and scenarios in Python instead of YAML/DSL.
If you want a kitchen-sink toolkit with every integration imaginable, use LangChain. If you want a small, fast, opinionated core where Scenarios are the unit of design, read on.
Features
- Scenarios — Pre-compiled execution paths that skip the ReAct loop on known intents. The framework's headline abstraction. Each step in a scenario chain can be a Python tool, an MCP tool, or a conditional hook.
- Tiered execution — Pipeline short-circuit → Scenario → ReAct → Direct, picked per request by a LIGHT classifier.
- Dual-model routing —
ModelTier.LIGHTfor intent classification,ModelTier.HEAVYfor reasoning. ~30× cost spread on real providers. - Three-level cache — KB exact match (global), tool result (per-user), session variables. Independent TTLs, periodic cleanup.
- Pipeline stages — Insert KB short-circuit, security checks, feature flags, or any custom logic before the cache/route step. Stages can CONTINUE, SHORT_CIRCUIT, or ABORT.
- Knowledge base ABC — Built-in TF-IDF
MemoryKnowledgeBasefor local dev; bring your own (Weaviate, Elasticsearch, pgvector) via a 3-method ABC. - SSE streaming — 12 event types (
THINKING,ACTION,OBSERVATION,ANSWER, etc.) with heartbeats. - Admin API — Status, tools, cache, config, KB endpoints as Flask blueprint and FastAPI router. Framework-agnostic core.
- Middleware pipeline — Tracing, retries, input validation, error sanitization. Hook into any stage.
- No HTTP in core —
httpxis optional. You can run SwiftAgentX in a Lambda, a Celery worker, or a notebook.
What's next (v0.3 roadmap)
The v0.2.0 release hardens what's already here. v0.3+ goes after the 2026-era patterns from frameworks like Claude Code:
- MCP server support — Scenarios and ReAct can use tools from any MCP server. One-line registration.
- 4-layer Memory — Current question / last-4-turns verbatim / reference window / incremental rolling summary. Topic-change detection triggers re-summarization.
- Hook system — Lifecycle hooks (pre/post tool, pre/post classify) and semantic hooks (topic change, scenario step conditional).
- Sub-agent dispatch — From inside ReAct or a Scenario step, spawn a focused sub-agent with isolated context. Parallel dispatch supported.
- Skill-in-ReAct — Markdown-defined workflows the ReAct loop can pull in on demand (different from Scenarios, which are pre-compiled and fast).
- Worktree-style workspace — File sandbox per session for agents that generate documents.
- Cache-friendly prompt order — Anthropic / OpenAI prompt cache optimization wired into the framework.
- Lazy tool loading — When a registry grows past a threshold, LIGHT model picks the relevant category before HEAVY sees schemas.
Installation
pip install swiftagentx
With optional dependencies:
pip install swiftagentx[openai] # httpx for async OpenAI-compatible calls
pip install swiftagentx[flask] # Flask SSE adapter
pip install swiftagentx[fastapi] # FastAPI SSE adapter
pip install swiftagentx[all] # Everything
Quick Start
Minimal Example
import asyncio
from swiftagentx import Agent, DummyModelClient
async def main():
agent = Agent(model=DummyModelClient(api_key="test", model="dummy"))
response = await agent.run("Hello!")
print(response.answer)
asyncio.run(main())
With OpenAI-Compatible API
Needs
pip install "swiftagentx[openai]"(brings in httpx + SOCKS support). Inside mainland China, also prependHTTP_PROXY= HTTPS_PROXY= ALL_PROXY=when calling China-based endpoints (Aliyun DashScope, etc.) so httpx doesn't try to tunnel through your foreign proxy.
import os, asyncio
from swiftagentx import Agent
from swiftagentx.providers.openai_compatible import OpenAICompatibleProvider
async def main():
agent = Agent(
# OpenAI:
# model=OpenAICompatibleProvider(
# api_key=os.environ["OPENAI_API_KEY"], model="gpt-4o",
# api_base="https://api.openai.com/v1",
# ),
# Aliyun DashScope (Qwen — what the benchmarks above use):
model=OpenAICompatibleProvider(
api_key=os.environ["DASHSCOPE_API_KEY"],
model="qwen-flash",
api_base="https://dashscope.aliyuncs.com/compatible-mode/v1",
),
# DeepSeek:
# model=OpenAICompatibleProvider(
# api_key=os.environ["DEEPSEEK_API_KEY"], model="deepseek-chat",
# api_base="https://api.deepseek.com/v1",
# ),
)
# `session_id` is optional: a single Agent instance shares one default
# session across calls, so a simple CLI bot has memory out of the box.
# Multi-user servers should pass an explicit session_id per user.
response = await agent.run("Explain quantum computing in one sentence.")
print(response.answer)
asyncio.run(main())
Works with any OpenAI-compatible endpoint — OpenAI, Azure OpenAI, DeepSeek, DashScope, Together, Fireworks, etc. Pick the snippet that matches your provider and set the matching env var.
Multi-turn conversations
Agent.run(text) accepts session_id= and user_id= keyword arguments.
Every turn that shares the same session_id shares one LayeredMemory
(L1 current / L2 last 4 turns verbatim / L3 reference / L4 rolling summary).
Without a session_id, the agent uses one stable default session id
generated at construction time — so a simple CLI chatbot with a single
Agent instance "just works":
agent = Agent(model=OpenAICompatibleProvider(...))
while user_input := input("You: "):
response = await agent.run(user_input) # default session shared across turns
print("Bot:", response.answer)
For a multi-user server, pass an explicit session_id per user instead.
Custom Tools
from swiftagentx import Agent, Tool, ToolOutput, DummyModelClient
class WeatherTool(Tool):
def __init__(self):
super().__init__(name="weather", description="Get weather for a city")
async def execute(self, context, **kwargs):
city = kwargs.get("city", "unknown")
return ToolOutput(success=True, result=f"Sunny, 25C in {city}")
async def main():
agent = Agent(model=DummyModelClient(api_key="test", model="dummy"))
agent.register_tool(WeatherTool())
response = await agent.run("What's the weather in Beijing?")
print(response.answer)
Dual-Model Strategy
Use a fast, cheap model for intent classification and a powerful model for reasoning:
from swiftagentx import Agent, ModelTier
from swiftagentx.providers.openai_compatible import OpenAICompatibleProvider
light = OpenAICompatibleProvider(api_key=key, model="gpt-3.5-turbo", api_base=base)
heavy = OpenAICompatibleProvider(api_key=key, model="gpt-4", api_base=base)
agent = Agent(
models={
ModelTier.LIGHT: light, # Intent classification (~200ms)
ModelTier.HEAVY: heavy, # ReAct reasoning & response generation
},
)
Scenario Toolchains
Skip the ReAct loop for common request patterns:
from swiftagentx import Agent, ScenarioConfig, ToolChainStep, DummyModelClient
agent = Agent(model=DummyModelClient(api_key="test", model="dummy"))
agent.register_tool(WeatherTool())
agent.register_scenario("weather", ScenarioConfig(
name="Weather Query",
description="Get weather information",
triggers=["weather", "temperature", "forecast"],
tool_chain=[
ToolChainStep(tool="weather", query_template="$city"),
],
cache_ttl=1800,
output_type="direct",
))
When the light model classifies a request as a "weather" scenario, the framework executes the tool chain directly — no ReAct loop, no extra LLM calls.
SSE Streaming
from swiftagentx import Agent, AgentRequest, SSEStreamAdapter, DummyModelClient
async def main():
agent = Agent(model=DummyModelClient(api_key="test", model="dummy"))
request = AgentRequest(user_id="u1", session_id="s1", user_input="Hello")
adapter = SSEStreamAdapter()
response = await agent.run_stream(request, adapter)
# Events are available via adapter.event_generator()
# In a web context, pipe this to an SSE response
Knowledge Base
Attach a knowledge base to your agent. Exact matches are returned instantly, skipping LLM processing entirely:
from swiftagentx import Agent, DummyModelClient, MemoryKnowledgeBase, Document
async def main():
agent = Agent(model=DummyModelClient(api_key="test", model="dummy"))
kb = MemoryKnowledgeBase()
await kb.add_documents([
Document(doc_id="faq-1", content="Return policy: 7-day no-questions-asked returns"),
Document(doc_id="faq-2", content="Points can be redeemed in the member store"),
])
agent.set_knowledge_base(kb) # Auto-registers KnowledgeBaseTool
response = await agent.run("Return policy: 7-day no-questions-asked returns")
# → Exact match (score=1.0), returned directly without LLM call
Use KnowledgeBaseStage in the pipeline for pre-processing short-circuit:
from swiftagentx import KnowledgeBaseStage
agent.pipeline.add_stage(KnowledgeBaseStage(kb=kb, threshold=0.95))
Implement the KnowledgeBase ABC to integrate with Weaviate, Elasticsearch, or any vector store. See Knowledge Base Guide.
Admin API
Monitor and manage your agent at runtime:
from swiftagentx.admin import AdminService, create_flask_admin_blueprint
service = AdminService(agent)
# Flask
bp = create_flask_admin_blueprint(service)
app.register_blueprint(bp, url_prefix="/admin")
# FastAPI
from swiftagentx.admin import create_fastapi_admin_router
router = create_fastapi_admin_router(service)
app.include_router(router, prefix="/admin")
Available endpoints:
| Method | Path | Description |
|---|---|---|
| GET | /admin/status |
Agent status, tool count, cache stats, uptime |
| GET | /admin/tools |
Registered tools with JSON Schema |
| GET | /admin/cache/stats |
Cache hit statistics |
| POST | /admin/cache/clear |
Clear cache (all or by level) |
| GET | /admin/config |
Current config (secrets masked) |
| PUT | /admin/config |
Update config at runtime |
| POST | /admin/kb/search |
Search knowledge base |
| POST | /admin/kb/documents |
Add documents |
| DELETE | /admin/kb/documents/:id |
Delete a document |
| GET | /admin/kb/stats |
KB document count and provider |
Security: Admin endpoints have no built-in authentication. Add your own middleware in production. See Admin Guide.
Flask Integration
from flask import Flask
from swiftagentx import Agent, DummyModelClient
from swiftagentx.web.flask_adapter import create_flask_blueprint
app = Flask(__name__)
agent = Agent(model=DummyModelClient(api_key="test", model="dummy"))
app.register_blueprint(create_flask_blueprint(agent))
# POST /api/v1/agent/sse — SSE streaming endpoint
# GET /api/v1/agent/health — Health check
FastAPI Integration
from fastapi import FastAPI
from swiftagentx import Agent, DummyModelClient
from swiftagentx.web.fastapi_adapter import create_fastapi_router
app = FastAPI()
agent = Agent(model=DummyModelClient(api_key="test", model="dummy"))
app.include_router(create_fastapi_router(agent))
Lifecycle Hooks
Two ways to hook into the request lifecycle.
A. Subclass Agent and override — simplest for project-local logic:
from swiftagentx import Agent
class MyAgent(Agent):
async def on_request_start(self, context): ... # request received
async def on_before_classify(self, context): ... # before intent classification
async def on_after_classify(self, context, intent): ... # after intent classification
async def on_before_tool_call(self, context, tool_name, params): ...
async def on_after_tool_call(self, context, tool_name, result): ...
async def on_before_respond(self, context, answer): # may rewrite answer
return answer
async def on_request_end(self, context, response): ... # request finished
Each override is optional; the framework calls the base no-op when you don't override.
B. HookRegistry — declarative, no subclassing (v0.3+):
from swiftagentx import HookEvent, HookResult, PythonHook
async def log_tool(ctx):
print(f"tool {ctx.tool_name}({ctx.tool_args}) → {ctx.tool_result}")
return HookResult()
agent.hooks.register(PythonHook(
name="log_tools", events={HookEvent.AFTER_TOOL_CALL}, handler=log_tool,
))
Twelve lifecycle events are dispatched: SESSION_START, REQUEST_START,
BEFORE_CLASSIFY, AFTER_CLASSIFY, BEFORE_SCENARIO_STEP,
AFTER_SCENARIO_STEP, BEFORE_TOOL_CALL, AFTER_TOOL_CALL,
BEFORE_REACT_ITER, AFTER_REACT_ITER, BEFORE_RESPOND, REQUEST_END
— plus semantic events like TOPIC_CHANGE. Handlers can return
HookResult(action="short_circuit", answer=...) to bypass the rest of
the request (useful for security policies / rate limiters / quota checks).
Both styles coexist and fire at the same boundary — subclass methods first, then registered hooks.
Middleware
from swiftagentx import Agent, Middleware, DummyModelClient
class LoggingMiddleware(Middleware):
async def process(self, context, next_handler):
print(f"[LOG] Processing: {context.get('user_input', '')}")
result = await next_handler(context)
print(f"[LOG] Done")
return result
agent = Agent(model=DummyModelClient(api_key="test", model="dummy"))
agent.use(LoggingMiddleware())
Configuration
from swiftagentx import Agent, SwiftAgentConfig, DummyModelClient
agent = Agent(
model=DummyModelClient(api_key="test", model="dummy"),
config=SwiftAgentConfig(
name="MyAgent",
max_iterations=5,
enable_cache=True,
max_input_length=5000,
debug=False, # Set True to expose error details
sse_heartbeat_interval=5.0,
max_cache_entries_per_level=10000,
),
)
Architecture
User Request
|
v
[Middleware Chain] ──> TracingMiddleware, custom middleware, ...
|
v
[Pipeline Stages]
├─ [KnowledgeBaseStage] ─── exact match? ──> SHORT_CIRCUIT (return directly)
├─ [Custom Stages] ─── security check, feature flags, ...
|
v
[Input Validation] ─── too long? ──> Reject
|
v
[Cache Check] ─── hit? ──> Return cached answer (0ms)
|
v
[Intent Classification] (Light Model, ~200ms)
|
├─ SCENARIO ──> Scenario Toolchain ──> Direct / LLM-formatted response
├─ REACT ────> ReAct Loop (Heavy Model) ──> Thought → Action → Observation → ... → Answer
└─ DIRECT ───> Direct LLM Response (Heavy Model)
|
v
[Lifecycle Hooks] ──> on_before_respond
|
v
[SSE Stream / Response]
Three-Level Cache
| Level | Scope | Key | TTL | Use Case |
|---|---|---|---|---|
| L1 - KB | Global | Query hash | Configurable (default 1h) | Knowledge base exact match |
| L2 - Code | Per-user + platform | User + platform + query hash | Configurable (default 5m) | Tool execution results |
| L3 - Dynamic | Per-session | Variable name | No expiry | Session state variables |
| Scenario | Per-scenario | Custom template | Configurable | Toolchain results |
Package Structure
swiftagentx/
├── core/ # Agent, memory, model client, cache, prompt, parameter, router, pipeline
├── models/ # Pydantic schemas (AgentRequest, AgentResponse, config)
├── tools/ # Tool base class, registry, executor, termination checker, scenario engine
├── knowledge_base/ # KnowledgeBase ABC, MemoryKB (TF-IDF), KnowledgeBaseTool, KnowledgeBaseStage
├── admin/ # AdminService, Flask Blueprint, FastAPI Router
├── stream/ # SSE adapter and event builder
├── providers/ # LLM providers (OpenAI-compatible, DummyModelClient)
├── storage/ # Storage backend abstraction (memory, extensible)
├── middleware/ # Middleware chain (tracing, custom)
└── web/ # Web framework adapters (Flask, FastAPI)
Documentation
| Document | Description |
|---|---|
| Architecture | System overview, dual-model strategy, cache, pipeline, ReAct loop |
| Tools Guide | Custom tool development |
| Scenarios Guide | Scenario toolchain configuration |
| Knowledge Base Guide | KB integration, MemoryKB, custom backends |
| Streaming Guide | SSE events, Flask/FastAPI integration, frontend examples |
| Admin Guide | Admin API, authentication, endpoints |
| Deployment Guide | Gunicorn, Uvicorn, Docker, Nginx |
Requirements
- Python >= 3.9
- Core dependencies:
pydantic >= 2.0,PyYAML >= 6.0 - No HTTP dependency in core —
httpxis optional (forOpenAICompatibleProvider)
License
Apache-2.0
中文文档
SwiftAgentX
面向生产环境的 Agent 框架,围绕 Scenario(预编译执行路径)构建—— 在已知意图上完全跳过 ReAct 循环。
核心理念:Scenario
其它框架把每个请求都当作开放式推理问题对待。SwiftAgentX 不这么想。 在生产环境中,80% 的流量是可预测的:"查订单状态"、"问退货政策"、 "预约 3 点的时段"。对这些请求来说,ReAct 循环是杀鸡用牛刀—— 3-5 次 LLM 调用、几秒延迟、一份没人解释得清楚的 token 账单。
Scenario 是一条预编译的执行路径:
agent.register_scenario("order_status", ScenarioConfig(
name="Order Status",
triggers=["订单", "我的快递在哪", "发货", "shipment"],
tool_chain=[
ToolChainStep(tool="order_db", query_template="$order_id"),
ToolChainStep(tool="courier_api", condition="status=in_transit"),
],
cache_ttl=120,
output_type="direct", # 不需要二次 LLM 调用来"润色"答案
))
当 LIGHT 模型把请求分类为 weather / order_status / balance_check 这类
场景时,SwiftAgentX 直接跑工具链——不进 ReAct 循环,没有第二次 LLM
调用。一次分类(LIGHT 模型,~200ms),一条工具链,结束。
这是框架最大的设计赌注,也是它在生产环境延迟和成本上真正甩开 LangChain / AutoGen / CrewAI 的地方。
分层执行
Scenario 位于四层执行模型的中央。所有数据用 DashScope Qwen 实测——
每个场景 30 次迭代,LIGHT=qwen-flash,HEAVY=qwen-turbo,一行命令
就能在你自己机器上复现(见 benchmarks/)。
| 请求类型 | 执行路径 | P50 延迟 | P95 | LLM 调用次数 |
|---|---|---|---|---|
| 缓存命中 / KB 精准匹配 | Pipeline 短路 | 0 ms | 0 ms | 0 |
| 已知意图(Scenario) | 预编译工具链 | 517 ms | 802 ms | 1(仅 LIGHT) |
| 开放式对话 | 直接 LLM 回复 | 1.4 s | 2.4 s | 2(LIGHT + HEAVY) |
| 多步推理 | 完整 ReAct 循环 | 3.1 s | 4.0 s | 3 |
LIGHT 模型挑路径。HEAVY 模型只在请求确实需要开放式推理时才启动。 两条便宜的路径(缓存 + Scenario)合起来覆盖生产环境绝大多数可预测的流量, 每个请求 0-1 次 LLM 调用——这就是头号卖点。复现:
git clone https://github.com/Caxson/swiftagentx.git
cd swiftagentx
pip install -e ".[dev,openai,benchmark]"
export DASHSCOPE_API_KEY=sk-...
python benchmarks/real_runner.py --iterations 30
Scenario 里能装什么
Scenario 不只是一个静态工具列表。链中的步骤可以是:
- 一个原生 Python
Tool - (v0.3+)一个 MCP 工具——任何 Model Context Protocol server 暴露的 工具,不需要写 Python wrapper
- (v0.3+)一个 hook——条件触发器,当工具链命中特定状态时分支到 LLM 调用、子 Agent 调度、或外部 shell 逻辑
这就是 Scenario 既快又能扩展的方式:路由决策很便宜,但每一步都能在 需要时调用整个 Agent 工具箱。
vs. LangChain / AutoGen / CrewAI
| SwiftAgentX | LangChain | AutoGen | CrewAI | |
|---|---|---|---|---|
| 预编译 Scenario 短路 | ✅ 核心差异化 | ❌ 无对应概念 | ❌ 无对应概念 | ❌ 无对应概念 |
| FAQ / 缓存命中 0 LLM 调用 | ✅ | 1-3 LLM 调用 | 2+ LLM 调用 | 2+ LLM 调用 |
| 内置三级缓存(KB / Tool / Session) | ✅ | 部分支持 | ❌ | ❌ |
| 双模型路由(LIGHT/HEAVY)原生内置 | ✅ | 自己接 | 自己接 | 自己接 |
| Pipeline 阶段短路(KB / 安全 / 功能开关) | ✅ | 自己写 | ❌ | ❌ |
| 流式细粒度事件类型 | ✅ 12 种 | ✅ | 部分 | ✅ |
框架无关核心(core/ 不依赖 HTTP) |
✅ | n/a | n/a | n/a |
| 测试套件 | 195 个测试,< 0.5 秒 | 庞大 | 庞大 | 中等 |
LangChain 更广。SwiftAgentX 更专——专于流量可预测、延迟和单次 LLM 成本是命门的生产场景。
适合谁
- 你做的 Agent 产品中,多数请求是可预测的(客服、订单运营、FAQ、 内部 copilot、AI 外呼),只有少数尾部需要真正的开放式推理。
- 你把 P95 延迟和单次请求 LLM 成本当作一等公民指标,不是事后再说。
- 你想要一个一下午能读完(4k 行源码)、改起来不害怕的框架。
- 你习惯用 Python 配置 tool / KB / scenario,不喜欢 YAML/DSL。
如果你想要"什么集成都有"的瑞士军刀工具包,去用 LangChain。如果你想要 小而快、Scenario 是设计单元的框架,继续往下看。
核心特性
- Scenario — 在已知意图上跳过 ReAct 循环的预编译执行路径。框架的 头号抽象。Scenario 链中每一步都可以是 Python tool、MCP tool、或条件 hook。
- 分层执行 — Pipeline 短路 → Scenario → ReAct → Direct,由 LIGHT 分类器为每个请求挑路径。
- 双模型路由 —
ModelTier.LIGHT做意图分类,ModelTier.HEAVY做 推理。在真实 provider 上有 ~30× 的成本差。 - 三级缓存 — KB 精准匹配(全局)、工具结果(按用户)、会话变量。 各自独立 TTL,周期清理。
- Pipeline 阶段 — 在 cache/route 之前插入 KB 短路、安全检查、功能开关 等任何自定义逻辑。阶段可返回 CONTINUE / SHORT_CIRCUIT / ABORT。
- 知识库 ABC — 内置 TF-IDF
MemoryKnowledgeBase用于本地开发;通过 3 方法 ABC 对接 Weaviate / Elasticsearch / pgvector。 - SSE 流式 — 12 种事件类型(
THINKING/ACTION/OBSERVATION/ANSWER等),带心跳保活。 - 管理后台 — Status、tools、cache、config、KB 端点,Flask Blueprint 和 FastAPI Router 都内置。核心层框架无关。
- 中间件流水线 — 追踪、重试、输入验证、错误脱敏,每个阶段都能挂 hook。
- 核心层无 HTTP 依赖 —
httpx是可选项,可以在 Lambda、Celery worker、 或 Notebook 里跑 SwiftAgentX。
下一步(v0.3 路线图)
v0.2.0 把现有的部分打磨扎实。v0.3+ 引入受 Claude Code 等 2026 范式 框架启发的设计:
- MCP server 支持 — Scenario 和 ReAct 都能用任何 MCP server 的 tool。 一行注册。
- 4 层 Memory — 当前问题 / 最近 4 轮 verbatim / 参考窗口 / 增量滚动 摘要。话题切换检测自动触发重新摘要。
- Hook 系统 — 生命周期 hook(pre/post tool、pre/post classify)+ 语义 hook(话题切换、Scenario 步骤条件触发)。
- 子 Agent 调度 — 从 ReAct 或 Scenario 步骤内部,spawn 一个上下文 隔离的专项子 Agent。支持并行调度。
- Skill-in-ReAct — ReAct 循环可以按需调用的 markdown 定义的工作流 (与 Scenario 互补:Scenario 预编译且快,Skill 通用且解释执行)。
- Worktree-style 工作目录 — 为生成文档的 Agent 提供每会话沙箱。
- Cache-friendly prompt 顺序 — Anthropic / OpenAI prompt cache 优化 内置到框架。
- Tool 延迟加载 — 当 registry 数量超过阈值时,LIGHT 模型先挑类别 再让 HEAVY 看 schema。
安装
pip install swiftagentx
可选依赖:
pip install swiftagentx[openai] # httpx,用于异步 OpenAI 兼容调用
pip install swiftagentx[flask] # Flask SSE 适配器
pip install swiftagentx[fastapi] # FastAPI SSE 适配器
pip install swiftagentx[all] # 全部安装
快速开始
最简示例
import asyncio
from swiftagentx import Agent, DummyModelClient
async def main():
agent = Agent(model=DummyModelClient(api_key="test", model="dummy"))
response = await agent.run("你好!")
print(response.answer)
asyncio.run(main())
接入 OpenAI 兼容 API
需要
pip install "swiftagentx[openai]"(包含 httpx + SOCKS 支持)。 国内调用国内服务(如阿里云 DashScope)时,前面加HTTP_PROXY= HTTPS_PROXY= ALL_PROXY=避免 httpx 走海外代理失败。
import os, asyncio
from swiftagentx import Agent
from swiftagentx.providers.openai_compatible import OpenAICompatibleProvider
async def main():
agent = Agent(
# OpenAI:
# model=OpenAICompatibleProvider(
# api_key=os.environ["OPENAI_API_KEY"], model="gpt-4o",
# api_base="https://api.openai.com/v1",
# ),
# 阿里云 DashScope (Qwen,benchmark 用的就是这套):
model=OpenAICompatibleProvider(
api_key=os.environ["DASHSCOPE_API_KEY"],
model="qwen-flash",
api_base="https://dashscope.aliyuncs.com/compatible-mode/v1",
),
# DeepSeek:
# model=OpenAICompatibleProvider(
# api_key=os.environ["DEEPSEEK_API_KEY"], model="deepseek-chat",
# api_base="https://api.deepseek.com/v1",
# ),
)
# 不传 session_id 也行——同一 Agent 实例的多次 run 共享一个默认 session,
# 单用户 CLI 聊天开箱即用。多用户服务端再为每个用户传自己的 session_id。
response = await agent.run("用一句话解释量子计算。")
print(response.answer)
asyncio.run(main())
多轮对话
Agent.run(text) 接受 session_id= 和 user_id= 关键字参数。同一 session_id
的所有 turn 共享同一份 LayeredMemory(L1 当前问题 / L2 最近 4 轮 verbatim /
L3 参考窗口 / L4 滚动摘要)。不传 session_id 时,Agent 用一个构造时生成的
稳定默认 session id,所以单 Agent 实例的 CLI 聊天机器人"开箱即用":
agent = Agent(model=OpenAICompatibleProvider(...))
while user_input := input("You: "):
response = await agent.run(user_input) # 默认 session 跨轮共享
print("Bot:", response.answer)
多用户服务端场景下,每个用户传自己的 session_id 即可隔离。
支持任何 OpenAI 兼容端点(OpenAI、Azure OpenAI、DeepSeek、通义千问 DashScope 等)。
自定义工具
from swiftagentx import Agent, Tool, ToolOutput, DummyModelClient
class WeatherTool(Tool):
def __init__(self):
super().__init__(name="weather", description="查询城市天气")
async def execute(self, context, **kwargs):
city = kwargs.get("city", "未知")
return ToolOutput(success=True, result=f"{city}:晴,25°C")
async def main():
agent = Agent(model=DummyModelClient(api_key="test", model="dummy"))
agent.register_tool(WeatherTool())
response = await agent.run("北京天气怎么样?")
print(response.answer)
双模型策略
用快速廉价的模型做意图分类,用强力模型做推理:
from swiftagentx import Agent, ModelTier
from swiftagentx.providers.openai_compatible import OpenAICompatibleProvider
light = OpenAICompatibleProvider(api_key=key, model="gpt-3.5-turbo", api_base=base)
heavy = OpenAICompatibleProvider(api_key=key, model="gpt-4", api_base=base)
agent = Agent(
models={
ModelTier.LIGHT: light, # 意图分类(~200ms)
ModelTier.HEAVY: heavy, # ReAct 推理和回复生成
},
)
场景工具链
跳过 ReAct 循环,直接执行预定义工具链:
from swiftagentx import Agent, ScenarioConfig, ToolChainStep, DummyModelClient
agent = Agent(model=DummyModelClient(api_key="test", model="dummy"))
agent.register_tool(WeatherTool())
agent.register_scenario("weather", ScenarioConfig(
name="天气查询",
description="查询指定城市天气",
triggers=["天气", "气温", "下雨"],
tool_chain=[
ToolChainStep(tool="weather", query_template="$city"),
],
cache_ttl=1800, # 缓存 30 分钟
output_type="direct", # 直接返回工具结果,无需 LLM 二次处理
))
当轻量模型将请求分类为 "weather" 场景时,框架直接执行工具链——不进 ReAct 循环,不产生额外 LLM 调用。
SSE 流式响应
from swiftagentx import Agent, AgentRequest, SSEStreamAdapter, DummyModelClient
async def main():
agent = Agent(model=DummyModelClient(api_key="test", model="dummy"))
request = AgentRequest(user_id="u1", session_id="s1", user_input="你好")
adapter = SSEStreamAdapter()
response = await agent.run_stream(request, adapter)
# 事件通过 adapter.event_generator() 获取
# 在 Web 场景中,将其接入 SSE 响应即可
知识库
为 Agent 接入知识库。精准匹配的结果直接返回,无需 LLM 处理:
from swiftagentx import Agent, DummyModelClient, MemoryKnowledgeBase, Document
async def main():
agent = Agent(model=DummyModelClient(api_key="test", model="dummy"))
kb = MemoryKnowledgeBase()
await kb.add_documents([
Document(doc_id="faq-1", content="退货政策:7天无理由退换货"),
Document(doc_id="faq-2", content="会员积分可在商城兑换礼品"),
])
agent.set_knowledge_base(kb) # 自动注册 KnowledgeBaseTool
response = await agent.run("退货政策:7天无理由退换货")
# → 精准匹配 (score=1.0),直接返回,无需 LLM 调用
在请求管道中使用 KnowledgeBaseStage 实现预处理短路:
from swiftagentx import KnowledgeBaseStage
agent.pipeline.add_stage(KnowledgeBaseStage(kb=kb, threshold=0.95))
实现 KnowledgeBase ABC 即可对接 Weaviate、Elasticsearch 或任何向量存储。详见 知识库指南。
管理后台
运行时监控和管理 Agent:
from swiftagentx.admin import AdminService, create_flask_admin_blueprint
service = AdminService(agent)
# Flask
bp = create_flask_admin_blueprint(service)
app.register_blueprint(bp, url_prefix="/admin")
# FastAPI
from swiftagentx.admin import create_fastapi_admin_router
router = create_fastapi_admin_router(service)
app.include_router(router, prefix="/admin")
可用端点:
| 方法 | 路径 | 说明 |
|---|---|---|
| GET | /admin/status |
Agent 状态、工具数、缓存统计、运行时间 |
| GET | /admin/tools |
已注册工具列表及 JSON Schema |
| GET | /admin/cache/stats |
缓存命中统计 |
| POST | /admin/cache/clear |
清除缓存(全部或按层级) |
| GET | /admin/config |
当前配置(敏感值脱敏) |
| PUT | /admin/config |
运行时更新配置 |
| POST | /admin/kb/search |
搜索知识库 |
| POST | /admin/kb/documents |
添加文档 |
| DELETE | /admin/kb/documents/:id |
删除文档 |
| GET | /admin/kb/stats |
知识库文档数量和提供者 |
安全提示:Admin 端点不内置认证。生产环境请自行添加中间件。详见 管理后台指南。
Flask 集成
from flask import Flask
from swiftagentx import Agent, DummyModelClient
from swiftagentx.web.flask_adapter import create_flask_blueprint
app = Flask(__name__)
agent = Agent(model=DummyModelClient(api_key="test", model="dummy"))
app.register_blueprint(create_flask_blueprint(agent))
# POST /api/v1/agent/sse — SSE 流式端点
# GET /api/v1/agent/health — 健康检查
FastAPI 集成
from fastapi import FastAPI
from swiftagentx import Agent, DummyModelClient
from swiftagentx.web.fastapi_adapter import create_fastapi_router
app = FastAPI()
agent = Agent(model=DummyModelClient(api_key="test", model="dummy"))
app.include_router(create_fastapi_router(agent))
生命周期钩子
两种风格挂钩。
A. 子类重写 Agent — 项目内部逻辑最简单:
from swiftagentx import Agent
class MyAgent(Agent):
async def on_request_start(self, context): ... # 收到请求
async def on_before_classify(self, context): ... # 意图分类前
async def on_after_classify(self, context, intent): ... # 意图分类后
async def on_before_tool_call(self, context, tool_name, params): ...
async def on_after_tool_call(self, context, tool_name, result): ...
async def on_before_respond(self, context, answer): # 可改写答复
return answer
async def on_request_end(self, context, response): ... # 请求结束
每个重写都可选,没重写就调框架的空实现。
B. HookRegistry — 声明式,不需要子类(v0.3+):
from swiftagentx import HookEvent, HookResult, PythonHook
async def log_tool(ctx):
print(f"tool {ctx.tool_name}({ctx.tool_args}) → {ctx.tool_result}")
return HookResult()
agent.hooks.register(PythonHook(
name="log_tools", events={HookEvent.AFTER_TOOL_CALL}, handler=log_tool,
))
框架派发 12 个 lifecycle 事件:SESSION_START、REQUEST_START、
BEFORE_CLASSIFY、AFTER_CLASSIFY、BEFORE_SCENARIO_STEP、
AFTER_SCENARIO_STEP、BEFORE_TOOL_CALL、AFTER_TOOL_CALL、
BEFORE_REACT_ITER、AFTER_REACT_ITER、BEFORE_RESPOND、REQUEST_END
——加上 TOPIC_CHANGE 等语义事件。Handler 可返回
HookResult(action="short_circuit", answer=...) 跳过后续请求处理
(用于安全策略 / 限流 / 配额检查等)。
两种风格可以同时用——同一时刻先调子类方法,再 dispatch 注册的 hook。
中间件
from swiftagentx import Agent, Middleware, DummyModelClient
class LoggingMiddleware(Middleware):
async def process(self, context, next_handler):
print(f"[日志] 处理请求: {context.get('user_input', '')}")
result = await next_handler(context)
print(f"[日志] 处理完成")
return result
agent = Agent(model=DummyModelClient(api_key="test", model="dummy"))
agent.use(LoggingMiddleware())
配置
from swiftagentx import Agent, SwiftAgentConfig, DummyModelClient
agent = Agent(
model=DummyModelClient(api_key="test", model="dummy"),
config=SwiftAgentConfig(
name="MyAgent",
max_iterations=5, # ReAct 最大迭代次数
enable_cache=True, # 启用三级缓存
max_input_length=5000, # 输入最大长度
debug=False, # 生产环境设为 False,隐藏错误详情
sse_heartbeat_interval=5.0,
max_cache_entries_per_level=10000,
),
)
架构
用户请求
|
v
[中间件链] ──> TracingMiddleware, 自定义中间件, ...
|
v
[请求管道]
├─ [KnowledgeBaseStage] ─── 精准匹配? ──> 短路返回
├─ [自定义阶段] ─── 安全检查, 功能开关, ...
|
v
[输入验证] ─── 超长? ──> 拒绝
|
v
[缓存检查] ─── 命中? ──> 返回缓存结果 (0ms)
|
v
[意图分类] (轻量模型, ~200ms)
|
├─ SCENARIO ──> 场景工具链 ──> 直接返回 / LLM 格式化
├─ REACT ────> ReAct 循环 (重量模型) ──> 思考 → 行动 → 观察 → ... → 回答
└─ DIRECT ───> 直接 LLM 回复 (重量模型)
|
v
[生命周期钩子] ──> on_before_respond
|
v
[SSE 流式 / 响应返回]
三级缓存详解
| 层级 | 作用域 | 缓存键 | 过期策略 | 使用场景 |
|---|---|---|---|---|
| L1 - KB | 全局 | 查询哈希 | 可配置(默认 1 小时) | 知识库精准匹配 |
| L2 - Code | 按用户+平台 | 用户 + 平台 + 查询哈希 | 可配置(默认 5 分钟) | 工具执行结果 |
| L3 - Dynamic | 按会话 | 变量名 | 不过期 | 会话状态变量 |
| Scenario | 按场景 | 自定义模板 | 可配置 | 工具链结果 |
包结构
swiftagentx/
├── core/ # Agent 核心、记忆、模型客户端、缓存、提示词、参数、路由、流水线
├── models/ # Pydantic 数据模型(AgentRequest、AgentResponse、配置)
├── tools/ # 工具基类、注册表、执行器、终止检查器、场景引擎
├── knowledge_base/ # 知识库 ABC、MemoryKB(TF-IDF)、KnowledgeBaseTool、KnowledgeBaseStage
├── admin/ # AdminService、Flask Blueprint、FastAPI Router
├── stream/ # SSE 适配器和事件构建器
├── providers/ # LLM 提供者(OpenAI 兼容、DummyModelClient)
├── storage/ # 存储后端抽象(内存实现,可扩展)
├── middleware/ # 中间件链(追踪、自定义)
└── web/ # Web 框架适配器(Flask、FastAPI)
详细文档
| 文档 | 内容 |
|---|---|
| 架构总览 | 系统架构、双模型策略、三级缓存、Pipeline、ReAct 循环 |
| 工具开发指南 | 自定义工具开发 |
| 场景工具链指南 | 场景工具链配置 |
| 知识库指南 | 知识库集成、MemoryKB 用法、自定义后端 |
| 流式指南 | SSE 事件、Flask/FastAPI 集成、前端示例 |
| 管理后台指南 | Admin API、认证、端点列表 |
| 部署指南 | Gunicorn、Uvicorn、Docker、Nginx |
环境要求
- Python >= 3.9
- 核心依赖:
pydantic >= 2.0、PyYAML >= 6.0 - 核心无 HTTP 依赖 —
httpx为可选项(用于OpenAICompatibleProvider)
许可证
Apache-2.0
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file swiftagentx-0.3.1.tar.gz.
File metadata
- Download URL: swiftagentx-0.3.1.tar.gz
- Upload date:
- Size: 258.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
28d477450d3f9a12a6a0555a8d185b300a6dfbd7e0a4006be620bed55d5e96be
|
|
| MD5 |
38404608d027adde9640627b4d6f0195
|
|
| BLAKE2b-256 |
0c3cf1d8c102a298d6958bc87756ed877afe6856ba579db2d81aa06709718a4a
|
File details
Details for the file swiftagentx-0.3.1-py3-none-any.whl.
File metadata
- Download URL: swiftagentx-0.3.1-py3-none-any.whl
- Upload date:
- Size: 114.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d97a639df34be8f4c115713976fc71407b3537b1dadf6fb8c6d240088f027ba3
|
|
| MD5 |
d72a927280f18dbf422c6e77fed55bec
|
|
| BLAKE2b-256 |
72ccfd721fe38cbfcad421e6bb0a95b7bcab5793b8ddf471182a8998a9a4cb5f
|