Deep LangGraph agent with memory, RAG, skills, OpenRouter, and Langfuse
Project description
Ziro
LangGraph conversational agent with persistent memory, RAG, progressive tool loading, MCP support, skills library, configurable guardrails, and automatic context compaction. Backed by OpenRouter (any LLM) and Langfuse (observability). Two runtime modes: SQLite/FAISS for local dev, PostgreSQL/pgvector for production.
Features
- Multi-agent — run as one of several named agents, each with its own persona, model, tools, MCP servers, guardrails, and compaction policy; per-agent on/off flags
- Long-term memory — user- and agent-scoped facts persisted across sessions
- Thread resumption — pick up any prior conversation by thread ID
- Context compaction — older turns are auto-folded into a running summary when a request nears the model's context window; recent turns kept verbatim; YAML-driven, model-aware budgeting
- RAG — semantic search over indexed documents (
.txt,.pdf, directories) - Skills library — index
SKILL.mdfiles; agent retrieves relevant skills and loads reference files on demand - Progressive tool loading — tools are deferred by default; the LLM discovers and activates only what it needs via
search_tools/load_tools - Subagents — delegate a self-contained subtask to a child subagent in isolated context; subagents are single-file
*.agent.mddefinitions (persona + scoped tools/skills) with inherited, never-widened permissions - Tool permissions (F06) — per-agent allow/deny/ask policy over tool/namespace globs;
asktriggers a human-in-the-loop approval, with allow-once / allow-thread / allow-always memory;shell:*ships dangerous-default-deny - Hooks (F01) — declarative lifecycle interception (
session_start,pre/post_turn,pre/post_tool,pre/post_model); Python-dotted or shell callables with glob matchers; powers permission gating, shell audit, memory reflection - Human handoff (F07) — optional
request_handofftool that pauses the turn for a human operator and injects their reply back into the conversation - Clarifying questions — core
ask_user_questiontool (mirrors Claude Code's AskUserQuestion): pauses the turn to ask 1-4 structured multiple-choice questions and folds the answers back in; shares the same HITL interrupt path as F06/F07 - Task / todo list (F08) —
write_todos/update_todostate tools; the running todo list renders in the TUI side pane - Self-improving memory (F03) — per-agent
memory_policy.yamlgoverns how facts are extracted and reflected into long-term store - Multimodal input (F02) — attach images via
/img <path>or--attach; vision-capable models receive image blocks, text-only models degrade gracefully (attachment_policy.yaml) - Filesystem tools — cross-platform core tools
read_file,grep,glob_filesfor local file inspection, plus deferred, ask-gatedwrite_file/edit_filemutating tools (separatefs_writenamespace); pure stdlib (identical on Windows + POSIX), path-confined to the project root, on by default (fs_policy.yaml) - Web fetch — optional SSRF-guarded
web_fetch(url)tool that returns cleaned, length-capped page text - Shell execution — optional
run_shelltool that runs real commands on the host or in a Docker sandbox; default-deny + human approval (F06/HITL), denylist backstop, timeout + process-tree kill, output caps, env scrubbing, and full audit. - Bash parity — a cross-platform
bashinterpreter (Git Bash/WSL-probed on Windows) makes pipes/quoting/heredocs behave identically on every OS; optional persistent per-thread bash session keepscd/env vars across calls - LLM adapter —
LLMAdapterabstraction (app/llm/) decouples the graph from OpenRouter specifically, the first step toward multi-provider support; per-agentprovider:selection inmeta.yaml - Local voice — optional push-to-talk speech I/O (F11): faster-whisper STT + Piper TTS, fully on-device, no cloud key required (cloud STT/TTS also pluggable)
- MCP support (F18) — connect external MCP servers (stdio, SSE, streamable HTTP, WebSocket) with OAuth; a live
McpManager(re)ingests tools into the shared registry, persistent sessions cut per-call latency, failures non-fatal - Interactive TUI (F12) — Textual front-end (transcript, todo/active-tools side pane, status footer, approval/question modal); paints in ~1.6s and is usable while the engine builds on a background thread (F26); slash-commands, live themes (carbon/nord/gruvbox), a context-usage meter (F16), and an MCP control panel (F17, Ctrl+O); session state is a single reactive
UIStore, with SVG snapshot tests + atextual servebrowser demo for UI peek - Slash commands (F14) —
/help /agent /thread /model /theme /think /clear /voice /img /mcp /stop /save /quitdispatched in the driver loop before the LLM (no model call, no transcript pollution) - Chat queue (F13) — in-flight buffer captures mid-turn input; optional background worker pool (
chat_once --submit/--status) survives restart - Guardrails — configurable input/output guards (regex injection, ML classifier, PII, Llama Guard content safety); YAML-driven, no code changes to add rules
- Agent persona — configurable
soul_prompt,system_prompt, andfallback_messagesvia the agent'sagent_config.yaml - Resilient replies — empty reasoning-only turns are re-prompted once, then served a non-blank fallback rather than a blank message
- Thinking / reasoning — optional extended reasoning via
OPENROUTER_REASONING_EFFORT, switchable live with/think <high|medium|low|off>(F15) - Fast startup — embeddings warmup is off the critical path (F24); the TUI shell + graph build are torch-free, torch loads on a background thread (F26)
- Dual backends — dev (SQLite + FAISS, zero infra) / prod (PostgreSQL + pgvector)
- Observability — Langfuse and LangSmith tracing, both optional
Quickstart
# 1. Install dependencies — includes the local guardrail backends
# (Presidio PII + Llama Guard content safety).
uv sync
# 2. (Optional) local voice — faster-whisper STT + Piper TTS (push-to-talk, no cloud key)
uv sync --group voice-local
# Download the local models the installed backends need (spaCy, Llama Guard GGUF,
# and — only if the voice group is installed — the default Piper voice).
# Fetches just what's missing.
python -m app.cli.startup
# 3. Configure environment
cp .env.example .env
# Set OPENROUTER_API_KEY in .env
# 4. (Production only) start PostgreSQL + pgvector
docker compose up -d
# 5. Run
python -m app.main --user alice
Resume a previous session:
python -m app.main --user alice --thread alice_abc12345
Single-shot (JSON output, for programmatic / LLM-driven use):
python -m app.cli.chat_once --user alice --message "Hello"
Environment Variables
| Variable | Required | Purpose |
|---|---|---|
OPENROUTER_API_KEY |
Yes | LLM access via OpenRouter |
OPENROUTER_MODEL |
No | Model override (default: google/gemini-2.5-flash-lite) |
OPENROUTER_REASONING_EFFORT |
No | Enable extended reasoning: low / medium / high (leave empty to disable) |
DATABASE_URL |
No | Enables production PostgreSQL + pgvector backends |
LANGFUSE_PUBLIC_KEY / LANGFUSE_SECRET_KEY |
No | Langfuse observability |
LANGFUSE_HOST |
No | Langfuse host (default: https://cloud.langfuse.com) |
LANGCHAIN_TRACING_V2 / LANGCHAIN_API_KEY |
No | LangSmith tracing |
Commands
uv sync # install dependencies (incl. local guardrail backends)
uv sync --group voice-local # + faster-whisper STT + Piper TTS (optional local voice)
python -m app.cli.startup # download spaCy + Llama Guard + default Piper voice if absent
python -m app.main --user <USER_ID> # start chat session (interactive agent picker)
python -m app.main --user <USER_ID> --agent <AGENT_ID> # start with a specific agent
python -m app.main --user <USER_ID> --voice # push-to-talk voice I/O (needs voice-local + voice_policy enabled)
python -m app.main --user <USER_ID> --thread <THREAD_ID> # resume session
python -m app.cli.chat_once --user <USER_ID> --message <MSG> # single-shot JSON output
python -m app.cli.chat_once --user <USER_ID> --agent <AGENT_ID> --message <MSG> # single-shot, specific agent
python -m app.cli.manage_agents list # list agents + per-agent on/off state
python -m app.cli.manage_agents add <AGENT_ID> --name NAME # scaffold a new agent
python -m app.cli.manage_agents remove <AGENT_ID> # delete an agent
python -m app.cli.manage_agents enable <AGENT_ID> # turn a single agent ON
python -m app.cli.manage_agents disable <AGENT_ID> # turn a single agent OFF
python -m app.cli.manage_agents set-default <AGENT_ID> # set the default agent
python -m app.cli.manage_agents set-model <AGENT_ID> <MODEL> # set an agent's model (or 'none' to clear)
python -m app.cli.manage_agents add-subagent <AGENT_ID> --tools a,b --namespaces rag --skills x # scaffold a single-file subagent
python -m app.cli.manage_agents remove-subagent <AGENT_ID> # delete a subagent definition
python -m app.cli.run_scenarios # replay scenario files in tmp/
python -m app.rag.indexer <path> # index docs (dir, .txt, or .pdf)
python -m skills.loader # index all SKILL.md files
python -m app.tools.indexer # re-index tool descriptions
python -m app.tui.demo # scripted TUI demo (UI peek); textual serve "python -m app.tui.demo" for a browser demo
mypy . # type check
pytest tests/test_tui_snapshots.py --snapshot-update # regenerate TUI SVG snapshot baselines after a UI change
docker compose up -d # start PostgreSQL + pgvector (prod)
docker compose build pdf-sandbox # build F10 sandbox image (ziro-pdf:latest) for researcher_docker
Visualize the graph
python -m app.cli.show_graph # ASCII to stdout (requires grandalf: uv sync --group dev)
python -m app.cli.show_graph --format mermaid # Mermaid markdown to stdout
python -m app.cli.show_graph --format mermaid -o g.md # Mermaid to file
python -m app.cli.show_graph --format png -o g.png # PNG (requires graphviz system package)
Slash commands (in-session)
Typed into the running session (TUI or REPL), dispatched before the LLM — no model call, no transcript pollution (F14):
| Command | Purpose |
|---|---|
/help (/h) |
List commands |
/agent [id] |
Switch agent (rebuilds the session, fresh thread) |
/thread [id] |
Switch / resume a thread |
/model [id] |
Switch the model (rebuilds the LLM) |
/theme [name] |
Switch the UI theme live + persist (carbon / nord / gruvbox) |
/think <high|medium|low|off> |
Set reasoning effort live (F15) |
/clear |
Start a fresh thread (same agent) |
/voice [on|off] |
Toggle push-to-talk voice I/O (needs --voice + voice_policy) |
/img <path> [text] |
Attach an image to this turn (F02) |
/mcp [server] |
Show MCP servers (TUI: open the control panel, Ctrl+O) |
/stop (/halt) |
Abort the turn currently in flight (Ctrl+S in the TUI) |
/save [path] |
Save the transcript to a JSON file |
/quit (/exit, /q) |
Exit the session |
Architecture
User input → [input_guard] → [compaction] → agent node → LLM with tools → DynamicToolNode → back to agent → ... → [output_guard] → final response
input_guard, output_guard (toggled by the agent's guardrails_policy.yaml), and compaction (toggled by compaction_policy.yaml) are conditional nodes. A blocked request returns a refusal message without reaching the LLM. The agent injects long-term user memories, the configured persona, and the running compaction summary into the system prompt before each LLM call. An empty reasoning-only reply is re-prompted once (reprompt), then replaced by a non-blank fallback (reply_fallback).
Multi-Agent System
The agent can run as one of several named agents, each with its own persona, tool policy, MCP servers, and guardrails.
- Per-agent folders — each agent lives in
app/agents/<agent_id>/and is self-contained, with its ownagent_config.yaml,tool_policy.yaml,mcp_servers.yaml,guardrails_policy.yaml,compaction_policy.yaml, andmeta.yaml. The shipped agents aredefault(life-coach persona),researcher(factual research aide with lighter guardrails;run_shellenabled in host mode), andresearcher_docker(same persona withrun_shellin Docker-sandbox mode). - Per-agent model —
meta.yaml'smodelfield sets the OpenRouter model for that agent. Omit it (ormodel: null) to use theOPENROUTER_MODELenv default. Aprovider:field selects the LLM adapter (defaultopenrouter— see LLM Adapter). - Per-agent on/off —
meta.yaml'senabledflag controls whether an agent is selectable. Toggle withmanage_agents enable|disable <id>. - Manifest —
app/agents/registry.yamlrecords the default agent. If the default is disabled, the first enabled agent is used. - Memory isolation — long-term memory is keyed per
(user_id, agent_id), so each agent keeps its own facts per user.
Progressive Tool Loading
Tools are split into three tiers:
| Tier | Description |
|---|---|
| Core tools | Always bound to the LLM every turn (defined in tool_policy.yaml → core_tools) |
| Meta-tools | Always bound; provide the discovery surface (search_tools, load_tools, list_tools, unload_tools) |
| Deferred tools | Registered in the ToolRegistry but NOT bound until the LLM calls load_tools([...]) |
Discovery loop: search_tools("<keyword>") → pick names → load_tools([...]) → call the tool. Activated tools persist for the rest of the thread (up to max_active, LRU eviction after that).
MCP Integration
External MCP servers are declared in the agent's mcp_servers.yaml. On startup, tool metadata is fetched from all enabled servers and registered into the ToolRegistry under their server name as namespace. MCP tools enter the same deferred-discovery surface as local tools — they are NOT bound to the LLM until activated via load_tools.
Supported transports: stdio, sse, streamable_http, websocket, plus OAuth (F18). A live McpManager (app/tools/mcp_manager.py) owns per-server state and a ServerStatus snapshot, holds a persistent session per server (one owner task keeps it open so tool calls reuse it — large latency win), and re-ingests tools into the shared registry on every connect so discovery stays current. connect / disconnect / reconnect are driven live from the F17 panel or /mcp. OAuthConfig picks the flow (auto / authorization_code / device_code; device-code when headless); tokens are cached and refreshed per request. Server failures are non-fatal; the rest of the registry is unaffected.
Subagents
The parent can delegate a subtask to a child subagent via spawn_subagent (or dispatch_subagents for parallel fan-out). Each child is a fresh graph on a namespaced thread with its own persona, empty history, and a restricted toolset; only a compact result is folded back into the parent.
- Single-file definitions — subagents live in
app/agents/.subagents/<id>.agent.md: YAML frontmatter (name,description,enabled,model, allowedtools:/namespaces:, scopedskills:, optionalsoul) plus a Markdown body that becomes the system prompt. The.subagents/dir is spawn-only — subagents never appear in the interactive picker. Shipped:scout(research) andreflector(reflection coach). - Permission inheritance — a child's tool rights are the intersection of its declared surface and the parent's current rights; never wider. Namespace tokens (
rag,rag:*) expand to the concrete tools available in the (possibly restricted) registry view. - Skill scoping — a subagent's
skills:allowlist restricts itssearch_skills/load_skill_refto only those skills. - Runaway guards — per-parent
subagent_policy.yamlsetsenabled,allowed_children/denylist,max_depth(default 1),max_spawns_per_thread, and parallelism limits; each child runs under arecursion_limit.
Scaffold one with manage_agents add-subagent <id> --tools … --namespaces … --skills ….
Web Fetch
An optional web_fetch(url) deferred tool (namespace webfetch) fetches an http/https page and returns cleaned, length-capped text. Gated by the agent's webfetch_policy.yaml — when enabled: false the tool is never registered (invisible to discovery). An SSRF guard resolves the host and refuses loopback / private / link-local / reserved targets, re-checking the final URL after redirects.
Filesystem Tools
Three cross-platform core tools (namespace fs) give the agent Claude-Code-style local file inspection: read_file, grep, and glob_files. They are pure Python stdlib (pathlib / re / os / fnmatch) — no new dependency, no shelling out — so behavior is identical on Windows and POSIX.
read_file(path, offset?, limit?)— read a UTF-8 text file as numbered lines; byte-capped (max_read_bytes) with anoffset/limitline window (max_read_lines); binary files are detected and skipped.grep(pattern, path, glob?, output_mode?, case_insensitive?)— regex search over file contents under a path;globfilters filenames;output_modeiscontent/files/count; prunes ignored dirs (.git,node_modules, …) and skips binaries; capped atmax_grep_matches.glob_files(pattern, path?)— find files by glob (incl.**), returned repo-relative and mtime-sorted (newest first), capped atmax_glob_results.
Every path is resolved against the project root and refused if it escapes (FsConfig.confine, default on). On by default via the agent's fs_policy.yaml (missing file → defaults); added to each shipped agent's core_tools and inherited by subagents through the rights intersection (like web_fetch). No F06 gate — only shell:* is dangerous-default-deny.
Mutating tools (write_file / edit_file) — deferred counterparts registered under a separate fs_write namespace so F06 can ask on writes while reads stay allow; byte-capped at max_write_bytes (512KB). Gated by FsConfig.allow_write (default on) — set false for a read-only agent.
Shell Execution
An optional run_shell(command, cwd?) deferred tool (namespace shell) runs real shell commands — on the host by default, or inside a Docker sandbox via a config flip. The trust model is Claude-Code-like: a human approves what runs.
- Default off & default-deny — gated by the agent's
shell_policy.yaml(enabled: false→ never registered). Even when enabled,shell:*is dangerous-default-deny in the permission system (F06); an agent must explicitlyallow-listshell:run_shellin itspermissions.yaml. Recommended recipe =allow+ask(human approval on every call via HITL).chat_onceauto-denies anaskinterrupt (fail-closed), so single-shot runs never hang. ShellGatebackstop — a non-removable hardened denylist (rm -rf,sudo,dd,mkfs, fork bombs, …) plus optional per-agent regex allow/deny and opt-inconfinecwd containment.- Two executors —
host(create_subprocess_shell, real shell with pipelines) andsandbox(create_subprocess_execwith a pluggablesandbox_launchertemplate — Docker/WSL/nsjail). Both behind the same gate, audit, and tool body. - Bash parity —
ShellConfig.interpreterdefaults to"bash": the command runs as one argv element viacreate_subprocess_exec(no shell-launcher involved), so bash tokenizes pipes/quoting/heredocs identically on every OS. On Windows this probes Git Bash then falls back towsl bash— the previous approach (pinningshell_executablethroughcreate_subprocess_shell) was unreliable there, since Windows always runs{executable or COMSPEC} /c {command}regardless of the executable.interpreter: "host"restores the old cmd.exe//bin/shbehavior. An optional persistent per-threadbash -lsession (session: "persistent") keepscd/env vars/venv activation across calls. - Always-on safety —
timeout_swith process-tree kill, output capped twice (at the source and at ingestion), and env scrubbing so secrets (OPENROUTER_API_KEY,DATABASE_URL,LANGFUSE_*) never reach the child. - Loop guards —
pre_toolhooks (app/hooks/guards.py) flag repeated near-identicalrun_shellcommands (shell_loop_guard) and repeated identical failure signatures across different commands (verification_loop_guard— pytest/mypy/tsc/panic tracebacks recurring while the fix isn't landing). - Audit —
pre_tool/post_toolhooks log every invocation (intent + outcome, including blocked/timeout) todata/audit/shell-audit.jsonl(dev) or atool_audittable (prod). Seedocs/audit-log.md.
Two agents ship the opt-in: researcher (host mode; its persona bakes a map-reduce flow for summarizing large PDFs without overflowing context) and researcher_docker (mode: sandbox, launcher docker run --rm --network none -v {workdir}:/work -w /work {image}, image ziro-pdf:latest — build with docker compose build pdf-sandbox).
Local Voice
Optional push-to-talk speech I/O (F11): the turn is bracketed by STT (your speech → the user message) and TTS (the post-output-guard reply → audio). Default backends are fully local — faster-whisper for STT, Piper for TTS — so no cloud API key is needed (cloud STT/TTS over httpx are also pluggable per agent). Gated by the agent's voice_policy.yaml (enabled: false by default).
Setup is two commands:
uv sync --group voice-local # faster-whisper STT + Piper TTS (+ sounddevice/PortAudio)
python -m app.cli.startup # downloads the default Piper voice into data/voices/ (gitignored)
Then enable it in the agent's app/agents/<id>/voice_policy.yaml:
enabled: true
tts_backend: local
tts_model: data/voices/en_US-amy-medium.onnx # path to the downloaded .onnx (config .onnx.json resolved alongside)
Run with python -m app.main --user <id> --voice, or toggle /voice on in the TUI (Ctrl+R to start/stop recording). Piper voices come from the Rhasspy catalog (<lang>-<name>-<quality>); download others with python -m piper.download_voices <voice-id> --download-dir data/voices and point tts_model at the new .onnx. The researcher agent ships voice-enabled.
Tool Permissions (F06)
Every tool call passes a per-agent permission gate (app/permissions/) wired as a pre_tool hook. The agent's permissions.yaml declares allow / deny / ask globs over tool/namespace qualified names plus a default_action; dangerous_default_deny (ships ["shell:*"]) forces a default-deny for dangerous namespaces. An ask decision raises a human-in-the-loop interrupt (the TUI ApprovalModal, REPL prompt, or auto-deny in chat_once); approvals can be remembered once, for the thread, or always (remember_default_scope, persisted per user/agent). Tool args are redacted (secrets masked, long values truncated) before display.
Hooks (F01)
app/hooks/ is a declarative lifecycle-interception layer. An agent's hooks.yaml lists HookSpec entries — each binds a HookEvent (session_start, pre_turn, post_turn, pre_tool, post_tool, pre_model, post_model), an fnmatch matcher (over tool qualified names for tool events), and a callable (Python dotted path or shell command). Hooks return a HookDecision (allow / deny / modify / interrupt); the runner short-circuits on deny/interrupt and folds modify. Hooks are ordered once at startup, so a no-hook registry has zero per-turn cost. This path powers the F06 permission gate and the F10 shell audit (*run_shell pre/post_tool).
Human Handoff (F07)
An optional request_handoff(reason) core tool (app/handoff/) lets the agent escalate to a human. It pauses the turn, surfaces the reason to an operator, accepts a human-authored reply, and resumes with that reply injected as the next message (assistant or tool, per config). The agent's handoff_policy.yaml (HandoffConfig) sets enabled, the operator prompt, a timeout with an on_timeout fallback, and inject_as.
Clarifying Questions
A core ask_user_question(questions) tool (app/clarify/) mirrors Claude Code's AskUserQuestion: pauses mid-turn to ask 1-4 structured clarifying questions (2-4 options each, optional multi-select, implicit free-text "Other"), then folds the human's answers back into the tool result. It shares the same HITL interrupt module as F06/F07 (app/graph/interrupts.py — InterruptKind.ASK_USER_QUESTION) rather than a bespoke pause mechanism; the TUI renders a QuestionModal, the REPL prints numbered options. Gated by clarify_policy.yaml (ClarifyConfig — enabled, max_questions, max_options; missing file → on by default).
Task / Todo List (F08)
app/tasks/ adds state-based todo tracking via two core tools: write_todos(todos) (full rewrite) and update_todo(id, status|content|result). Todos live in AgentState.todos (merged by a per-id reducer), survive checkpointing, and render live in the TUI side pane. A todo can carry an agent_id to mark subagent delegation.
Multimodal Input (F02)
Images attach via the /img <path> directive or the CLI --attach flag (app/io/attachments.py). build_human_message checks supports_multimodal(model): vision-capable models receive base64 data-URI image blocks in a multimodal HumanMessage; text-only models get a graceful notice instead. Bounded by the agent's attachment_policy.yaml (enabled, max_image_bytes, allowed_image_types).
Interactive TUI & Themes (F12 / F16 / F17)
app/tui/ is a Textual front-end over the shared TurnRunner: a transcript, a todo/active-tools side pane, a status footer, and an ApprovalModal rendering the shared HITL interrupt. It paints in ~1.6s and is usable immediately while the engine builds on a background thread — any turn typed early is queued and runs in order (F26). Extras:
- Themes —
app/tui/theme.pymaps semantic color slots (not literal hex) to Textual + Rich widgets. Carbon (amber + steel on carbon black) is the default;nordandgruvboxalso ship. Palettes are discovered from three roots (bundledapp/tui/themes/, project./themes/, home~/.ziro/themes/);/theme [name]swaps live and persists todata/ui_prefs.json. - Context meter (F16) — a
ContextMeterwidget reveals at ≥50% of the usable input budget and shades muted → amber → red as the request approaches the compaction trigger, with a trip-line marker at the trigger line. - MCP control panel (F17) —
Ctrl+O(or/mcpin the TUI) opens a modalMcpPanel: a live table of servers (state, transport, auth, tool count) with Connect / Disconnect / Reconnect, tool peek, and an OAuth device-code prompt. - Reactive state + UI peek — session state (identity/lifecycle/content/chrome) lives in one reactive
UIStore(app/tui/store.py) mutated only through typed messages (app/tui/messages.py); widgets bind viawatch()instead of imperativeupdate_*()calls.python -m app.tui.demo(ortextual serve "python -m app.tui.demo"for a browser demo) drives a scripted fake runner for visual review; SVG snapshot regression tests live intests/test_tui_snapshots.py(pytest --snapshot-updateto refresh baselines). Seedocs/tui-peek.md.
LLM Adapter
app/llm/ wraps LangChain's Runnable behind an LLMAdapter ABC (invoke/ainvoke/stream/astream, bind_tools(), context_window()/supports()), decoupling the graph from OpenRouter specifically. app/llm/factory.py resolves a provider (meta.yaml's provider: field, default openrouter) to an adapter — only OpenrouterAdapter ships today, but unknown providers fail fast rather than silently falling back, and get_llm()/get_agent_llm()/get_summary_llm() in app/core/config.py are now thin wrappers over build_adapter() rather than constructing ChatOpenAI directly. This is step one toward multi-provider support.
Context Compaction
Each agent has a compaction_policy.yaml. When a turn's request exceeds trigger_pct of the model's usable input budget (window minus reserved output and schema headroom), the compaction node folds older messages into a running summary (injected into the system prompt) and drops them from live history via RemoveMessage — shrinking both the in-flight request and the persisted checkpoint. The keep_recent_min most-recent messages are always kept verbatim, and the split is chosen to never orphan a ToolMessage from its parent call.
Budget math is model-aware: app/llm/openrouter_catalog.py (formerly app/core/model_specs.py) reads the OpenRouter /models catalog once per process for each model's context_length, max_completion_tokens, and supported_parameters, so the trigger, the reserved-output carve-out, and the max_tokens sent to the LLM stay consistent. Strategy is hybrid (summarize the dropped span via a clean tool-free LLM) or trim (drop only). Setting enabled: false restores un-compacted behavior exactly. A separate max_tool_message_tokens / max_tool_message_pct cap bounds a single oversized ToolMessage at ingestion. See docs/chat-compression.md.
Tools available to the LLM
| Tool | Tier | Purpose |
|---|---|---|
search_rag(query) |
Core | Semantic search over indexed documents |
search_skills(query) |
Core | Semantic search over SKILL.md files |
load_skill_ref(skill_name, filename) |
Core | Load a reference/script file from a skill directory on demand |
save_memory(content) |
Core | Persist a user fact to long-term store |
search_tools(query) |
Meta | Find available deferred tools by keyword |
load_tools(names) |
Meta | Activate deferred tools for this thread |
list_tools(namespace) |
Meta | Browse all tools by namespace |
unload_tools(names) |
Meta | Deactivate tools to free context |
write_todos(todos) |
Core | Rewrite the turn's todo list (F08); renders in the TUI side pane |
update_todo(id, …) |
Core | Update one todo's status/content/result (F08) |
request_handoff(reason) |
Core | Pause the turn for a human operator and inject their reply (F07); gated by handoff_policy.yaml |
ask_user_question(questions) |
Core | Pause the turn to ask 1-4 structured clarifying questions (2-4 options, optional multi-select); gated by clarify_policy.yaml |
spawn_subagent(agent_id, task) |
Deferred | Delegate a self-contained subtask to a child subagent in isolated context; returns one concise result |
dispatch_subagents(tasks) |
Deferred | Parallel fan-out (gated by enable_parallel); one result block per child |
get_subagent_transcript(run_id) |
Deferred | Pull a past child run's full transcript on demand (capped) |
read_file(path, offset?, limit?) |
Core | Read a project text file as numbered lines; byte/line capped, binary-safe, path-confined (fs) |
grep(pattern, path, glob?, output_mode?, case_insensitive?) |
Core | Regex search over file contents (content/files/count); skips ignored dirs + binaries (fs) |
glob_files(pattern, path?) |
Core | Find files by glob, mtime-sorted, repo-relative, path-confined (fs) |
write_file(path, content) / edit_file(path, …) |
Deferred | Mutating file writes, byte-capped, path-confined; separate fs_write namespace so writes can ask while reads stay allow; gated by fs_policy.yaml allow_write |
web_fetch(url) |
Deferred | Fetch an http/https page, return cleaned text; SSRF-guarded; gated by webfetch_policy.yaml |
run_shell(command, cwd?) |
Deferred | Run a shell command on the host or a sandbox container; default-deny + HITL, denylist/timeout/output-cap/env-scrub; gated by shell_policy.yaml |
| deferred tools | Deferred | Any tool registered in the registry (local or MCP) |
Runtime Modes
| Mode | Trigger | Store | Checkpointer | Vector |
|---|---|---|---|---|
| Dev | No DATABASE_URL |
SqliteStore (./data/memories.db) |
AsyncSqliteSaver (./data/checkpoints.db) |
FAISS |
| Prod | DATABASE_URL set |
PostgresStore |
AsyncPostgresSaver |
pgvector |
Key Modules
| Module | Role |
|---|---|
app/graph/graph.py |
Builds LangGraph state machine; wires nodes + routing (including guardrail nodes) |
app/graph/nodes.py |
make_agent_node(), make_dynamic_tool_node(), make_memory_tools() |
app/graph/state.py |
AgentState TypedDict — messages, guardrail_*, active_tools (+ReplaceActiveTools for LRU), running_summary, last_compaction_index; extract_text() for reasoning-block content |
app/core/config.py |
Reads .env; LLM factories get_llm() / get_agent_llm() / get_summary_llm() — thin wrappers over app.llm.factory.build_adapter() (model-aware max_tokens), embeddings, Langfuse handler; load_* for agent/tool/mcp/compaction configs |
app/core/agent_profiles.py |
AgentProfile — resolves per-agent config files + model; select_profile(), list_agent_profiles(), all_agent_profiles(), get_agent_profile(); subagent resolution (SUBAGENTS_DIR, *_subagent_profile(s)) |
app/core/agent_md.py |
Parses single-file *.agent.md subagent definitions (frontmatter + Markdown body) |
app/llm/openrouter_catalog.py |
OpenRouter /models catalog cache (renamed from app/core/model_specs.py): context_length, max_completion_tokens, supported_parameters, provider; supports_parameter() gating, list_model_ids(provider=)/list_providers() |
app/llm/adapter.py |
LLMAdapter ABC wrapping a LangChain Runnable — invoke/ainvoke/stream/astream, bind_tools(), build() classmethod |
app/llm/factory.py |
build_adapter(provider, ...) — provider registry (only openrouter shipped; unknown provider errors) |
app/core/paths.py |
PROJECT_ROOT + resolved dirs (AGENTS_DIR, DATA_DIR, SKILLS_DIR) anchored to the repo root |
app/compaction/ |
node.py (make_compaction_node, pick_split), window.py (budget/trigger math), summarizer.py, tokenizer.py, models.py (CompactionConfig / CompactionResult) |
app/memory/store.py |
save_memory / load_memories — user-scoped long-term facts |
app/memory/checkpointer.py |
Thread-level checkpoints (enables --thread resumption) |
app/rag/retriever.py |
search_rag, search_skills, load_skill_ref tools; lazy FAISS/PGVector loading with module-level cache |
app/rag/indexer.py |
Chunking (1000 tokens, 200 overlap), embedding (HuggingFace all-MiniLM-L6-v2), indexing |
app/tools/registry.py |
ToolRegistry — unified source of truth for all tools; semantic + keyword search; ToolDescriptor; expand() (namespace tokens → tools); view() restricted facade |
app/tools/meta_tools.py |
search_tools, load_tools, list_tools, unload_tools — the progressive-loading discovery surface |
app/tools/bootstrap.py |
build_local_registry() / ingest_mcp_tools() — startup wiring of local + MCP tools |
app/tools/mcp_client.py |
Connects to MCP servers from agent mcp_servers.yaml; fetches + allowlist-filters tools |
app/tools/indexer.py |
CLI to re-index tool descriptions into the tools vector-store collection |
app/guardrails/models.py |
Pydantic models: PolicyConfig, PolicyRule, GuardrailDecision, backend configs |
app/guardrails/backends.py |
RegexInjectionRunnable, LocalClassifierRunnable, PresidioRunnable, LlamaGuardRunnable, LLMGuardrailRunnable; make_backend() factory |
app/guardrails/evaluator.py |
GuardrailEvaluator — groups rules by backend, runs backends concurrently |
app/guardrails/nodes.py |
make_input_guard_node() / make_output_guard_node() LangGraph node factories |
app/guardrails/policy_loader.py |
load_policies() — reads an agent's guardrails_policy.yaml |
app/cli/chat_once.py |
Single-shot invocation; outputs one JSON line (for programmatic / LLM-driven use) |
app/cli/run_scenarios.py |
Replay scenario JSON files through the agent; writes side-by-side transcripts |
app/cli/manage_agents.py |
Scaffold, list, enable/disable, and configure agents |
app/cli/show_graph.py |
Visualize the LangGraph state machine (ASCII, Mermaid, PNG) |
app/subagents/ |
models.py (SpawnPolicy etc.), orchestrator.py (SpawnContext, graph cache, rights intersection, skill-scope threading), tool.py (make_subagent_tools) |
app/permissions/ |
F06: models.py (PermissionPolicy/PermissionDecision/PermissionRequest), gate.py/policy.py (evaluate + arg redaction), hook.py (pre_tool wiring), store.py (durable grants) |
app/hooks/ |
F01: models.py (HookEvent/HookSpec/HookContext/HookDecision), registry.py, runner.py, callables.py (Python/shell callables), guards.py (shell_loop_guard/verification_loop_guard — repeated-command / repeated-failure loop detection) |
app/graph/interrupts.py |
Shared HITL interrupt module for F06/F07/clarify: InterruptKind, raise_interrupt(), InterruptRequest/InterruptResponse, render_interactive() |
app/handoff/ |
F07: models.py (HandoffConfig), tools.py (request_handoff); gated by handoff_policy.yaml |
app/clarify/ |
ask_user_question core tool: models.py (ClarifyConfig), tools.py (make_clarify_tools); gated by clarify_policy.yaml |
app/tasks/ |
F08: models.py (Todo/TodoStatus), tools.py (write_todos/update_todo), reducer.py, render.py |
app/io/attachments.py |
F02: AttachmentConfig, parse_attachments, to_image_block, build_human_message (multimodal vs text-only) |
app/tui/theme.py + themes/ |
Theme palettes (Palette, carbon/nord/gruvbox, discover_palettes, live set_active); app/core/ui_prefs.py persists the choice |
app/tui/mcp_panel.py |
F17: McpPanel modal + OAuthPromptModal (device-code prompt) |
app/tui/store.py |
UIStore — single non-visual reactive state holder (identity/lifecycle/content/chrome) mounted on the App |
app/tui/messages.py |
SessionSwitched/BusyChanged/StartupTicked/StateChanged/UsageChanged/ThemeChanged — the only vocabulary that mutates UIStore |
app/tui/demo.py |
ScriptedRunner (torch-free fake TurnRunner) + build_demo_app() — UI peek via python -m app.tui.demo / textual serve |
app/tools/mcp_manager.py + mcp_models.py |
F18: McpManager (connect/disconnect/reconnect, persistent sessions, re-ingest), ServerStatus/ToolSpec/OAuthConfig |
app/webfetch/ |
models.py (WebFetchConfig), tool.py (web_fetch, SSRF guard, HTML→text); gated by webfetch_policy.yaml |
app/fs/ |
models.py (FsConfig — read caps + allow_write/max_write_bytes), tools.py (make_fs_tools — read_file/grep/glob_files; make_fs_write_tools — write_file/edit_file under the separate fs_write namespace; _resolve path confinement); pure stdlib, cross-platform; gated by fs_policy.yaml (on by default) |
app/tools/shell.py + shell_models.py |
F10/F27 shell: ShellGate, host + sandbox executors, make_shell_tool, output caps, process-tree kill; _resolve_bash() (Git Bash/WSL probing) + PersistentBashSession; ShellConfig/GateDecision |
app/tools/shell_audit.py |
F10 audit hooks (log_intent/log_outcome) → data/audit/shell-audit.jsonl / tool_audit table |
docker/pdf-sandbox/Dockerfile |
F10 sandbox image (python:3.12-slim + pypdf + reportlab); built via the pdf-sandbox compose service |
skills/loader.py |
Walks ./skills/ for SKILL.md files; indexes them as the "skills" collection |
app/agents/<id>/agent_config.yaml |
Per-agent system_prompt, soul_prompt, fallback_messages |
app/agents/<id>/tool_policy.yaml |
Per-agent core_tools, search_k, max_active, denylist |
app/agents/<id>/mcp_servers.yaml |
Per-agent MCP server declarations (transport, command/url, allowlist, enabled flag) |
app/agents/<id>/guardrails_policy.yaml |
Per-agent named backends pool + input/output rules |
app/agents/<id>/compaction_policy.yaml |
Per-agent compaction policy (enabled, trigger/target band, retention, summarizer) |
app/agents/<id>/subagent_policy.yaml |
Per-parent spawn policy (enabled, allowed_children/denylist, max_depth, spawn/parallel caps) |
app/agents/<id>/webfetch_policy.yaml |
Per-agent web-fetch policy (enabled master flag, SSRF/scheme/host controls, content cap) |
app/agents/<id>/fs_policy.yaml |
Per-agent filesystem-tools policy (enabled, confine, read byte/line caps, grep/glob result caps, ignore_dirs) |
app/agents/<id>/shell_policy.yaml |
Per-agent shell policy (enabled, mode host/sandbox, denylist/allowlist, timeout, output cap, env passthrough, sandbox launcher/image) |
app/agents/<id>/permissions.yaml |
Per-agent F06 tool-permission policy (allow/deny/ask globs; shell:* default-deny; remember scope) |
app/agents/<id>/hooks.yaml |
Per-agent F01 lifecycle hooks (events, glob matcher, Python/shell callable; e.g. *run_shell audit pre/post_tool) |
app/agents/<id>/handoff_policy.yaml |
Per-agent F07 human-handoff policy (enabled, operator prompt, timeout/fallback, inject_as) |
app/agents/<id>/attachment_policy.yaml |
Per-agent F02 multimodal policy (enabled, max_image_bytes, allowed_image_types) |
app/agents/<id>/memory_policy.yaml |
Per-agent F03 self-improving-memory policy (fact extraction / reflection) |
app/agents/<id>/voice_policy.yaml |
Per-agent F11 voice policy (enabled, STT/TTS backend + model) |
app/agents/<id>/queue_policy.yaml |
Per-agent F13 queue policy (in-flight always on; background worker default off) |
app/agents/<id>/meta.yaml |
Agent name, description, enabled, model |
app/agents/.subagents/<id>.agent.md |
Single-file subagent definition (frontmatter persona/tools/skills + Markdown system prompt) |
app/agents/registry.yaml |
Records the default agent id |
Guardrails
Policies are declared in the agent's guardrails_policy.yaml. Each rule references a named backend from the backends: pool. Multiple backends run concurrently; first block in declaration order wins. Rules support refusal_templates lists for randomized refusals.
| Backend type | Description |
|---|---|
regex_injection |
Deterministic OWASP prompt-injection patterns + evasion detection (typoglycemia, Base64/hex, char-spacing); zero dependencies |
local_classifier |
HuggingFace text-classification (prompt injection detection) |
presidio |
Microsoft Presidio PII detection (bundled in core deps; spaCy model via python -m app.cli.startup) |
llama_guard |
Llama Guard 3 1B GGUF content safety; supports block_categories to scope which S-codes block (bundled in core deps; GGUF via python -m app.cli.startup) |
openai |
Any OpenAI-compatible API for LLM-based policy evaluation |
Extending
Add documents:
python -m app.rag.indexer ./my-docs/
Add a skill:
mkdir -p skills/my-skill/references
# create skills/my-skill/SKILL.md
# (optional) add reference files to skills/my-skill/references/
python -m skills.loader
Add an agent:
python -m app.cli.manage_agents add my-agent --name "My Agent"
# Edit app/agents/my-agent/agent_config.yaml, tool_policy.yaml, etc.
Add a subagent:
python -m app.cli.manage_agents add-subagent my-scout --tools rag:search_rag --namespaces rag --skills deep-research
# Edit app/agents/.subagents/my-scout.agent.md (frontmatter tools/namespaces/skills + persona body)
Add an MCP server: add an entry to the agent's app/agents/<id>/mcp_servers.yaml, restart the agent.
Add a guardrail rule: edit the agent's app/agents/<id>/guardrails_policy.yaml — add a backend entry and a rule referencing it. No code changes needed.
Change default model:
OPENROUTER_MODEL=anthropic/claude-sonnet-4-5
Change model for a specific agent:
python -m app.cli.manage_agents set-model <AGENT_ID> anthropic/claude-sonnet-4-5
Enable reasoning:
OPENROUTER_REASONING_EFFORT=medium
Configure agent persona: edit app/agents/<id>/agent_config.yaml (system_prompt, soul_prompt, fallback_messages).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ziro-0.1.1.tar.gz.
File metadata
- Download URL: ziro-0.1.1.tar.gz
- Upload date:
- Size: 1.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e0348e73dae7f2f7317a36c85c32a79dcb75c7a44f3881c6fafeedaed7defec3
|
|
| MD5 |
926c4522aa3cf37ff3aa67d4d9d661a3
|
|
| BLAKE2b-256 |
6452517d9358da88796fa5f77aa1d7a97e3e196cefaaaf384deb25fbd54566e4
|
Provenance
The following attestation bundles were made for ziro-0.1.1.tar.gz:
Publisher:
release.yml on hRupanjan/ziro
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ziro-0.1.1.tar.gz -
Subject digest:
e0348e73dae7f2f7317a36c85c32a79dcb75c7a44f3881c6fafeedaed7defec3 - Sigstore transparency entry: 2064025316
- Sigstore integration time:
-
Permalink:
hRupanjan/ziro@c5e1a12ca24918f3a1361fad2023c758f844bf20 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/hRupanjan
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@c5e1a12ca24918f3a1361fad2023c758f844bf20 -
Trigger Event:
release
-
Statement type:
File details
Details for the file ziro-0.1.1-py3-none-any.whl.
File metadata
- Download URL: ziro-0.1.1-py3-none-any.whl
- Upload date:
- Size: 962.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
01e0e93ff7d89eef15e4055c3fe9fbcf88d621cea8a0481876f684f77b9d1a38
|
|
| MD5 |
659df91f34b181dc5527eb08408269d3
|
|
| BLAKE2b-256 |
2909ed0b9485b9f7d8695ab3b14f6497cac7288f8b0c2e6153ca5664f2107076
|
Provenance
The following attestation bundles were made for ziro-0.1.1-py3-none-any.whl:
Publisher:
release.yml on hRupanjan/ziro
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ziro-0.1.1-py3-none-any.whl -
Subject digest:
01e0e93ff7d89eef15e4055c3fe9fbcf88d621cea8a0481876f684f77b9d1a38 - Sigstore transparency entry: 2064025340
- Sigstore integration time:
-
Permalink:
hRupanjan/ziro@c5e1a12ca24918f3a1361fad2023c758f844bf20 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/hRupanjan
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@c5e1a12ca24918f3a1361fad2023c758f844bf20 -
Trigger Event:
release
-
Statement type: