Deep LangGraph agent with memory, RAG, skills, OpenRouter, and Langfuse

Project description

Ziro

LangGraph conversational agent with persistent memory, RAG, progressive tool loading, MCP support, skills library, configurable guardrails, and automatic context compaction. Backed by OpenRouter (any LLM) and Langfuse (observability). Two runtime modes: SQLite/FAISS for local dev, PostgreSQL/pgvector for production.

Features

Multi-agent — run as one of several named agents, each with its own persona, model, tools, MCP servers, guardrails, and compaction policy; per-agent on/off flags
Long-term memory — user- and agent-scoped facts persisted across sessions
Thread resumption — pick up any prior conversation by thread ID
Context compaction — older turns are auto-folded into a running summary when a request nears the model's context window; recent turns kept verbatim; YAML-driven, model-aware budgeting
RAG — semantic search over indexed documents (.txt, .pdf, directories)
Skills library — index SKILL.md files; agent retrieves relevant skills and loads reference files on demand
Progressive tool loading — tools are deferred by default; the LLM discovers and activates only what it needs via search_tools / load_tools
Subagents — delegate a self-contained subtask to a child subagent in isolated context; subagents are single-file *.agent.md definitions (persona + scoped tools/skills) with inherited, never-widened permissions
Tool permissions (F06) — per-agent allow/deny/ask policy over tool/namespace globs; ask triggers a human-in-the-loop approval, with allow-once / allow-thread / allow-always memory; shell:* ships dangerous-default-deny
Hooks (F01) — declarative lifecycle interception (session_start, pre/post_turn, pre/post_tool, pre/post_model); Python-dotted or shell callables with glob matchers; powers permission gating, shell audit, memory reflection
Human handoff (F07) — optional request_handoff tool that pauses the turn for a human operator and injects their reply back into the conversation
Clarifying questions — core ask_user_question tool (mirrors Claude Code's AskUserQuestion): pauses the turn to ask 1-4 structured multiple-choice questions and folds the answers back in; shares the same HITL interrupt path as F06/F07
Task / todo list (F08) — write_todos / update_todo state tools; the running todo list renders in the TUI side pane
Self-improving memory (F03) — per-agent memory_policy.yaml governs how facts are extracted and reflected into long-term store
Multimodal input (F02) — attach images via /img <path> or --attach; vision-capable models receive image blocks, text-only models degrade gracefully (attachment_policy.yaml)
Filesystem tools — cross-platform core tools read_file, grep, glob_files for local file inspection, plus deferred, ask-gated write_file/edit_file mutating tools (separate fs_write namespace); pure stdlib (identical on Windows + POSIX), path-confined to the project root, on by default (fs_policy.yaml)
Web fetch — optional SSRF-guarded web_fetch(url) tool that returns cleaned, length-capped page text
Shell execution — optional run_shell tool that runs real commands on the host or in a Docker sandbox; default-deny + human approval (F06/HITL), denylist backstop, timeout + process-tree kill, output caps, env scrubbing, and full audit.
Bash parity — a cross-platform bash interpreter (Git Bash/WSL-probed on Windows) makes pipes/quoting/heredocs behave identically on every OS; optional persistent per-thread bash session keeps cd/env vars across calls
LLM adapter — LLMAdapter abstraction (app/llm/) decouples the graph from OpenRouter specifically, the first step toward multi-provider support; per-agent provider: selection in meta.yaml
Local voice — optional push-to-talk speech I/O (F11): faster-whisper STT + Piper TTS, fully on-device, no cloud key required (cloud STT/TTS also pluggable)
MCP support (F18) — connect external MCP servers (stdio, SSE, streamable HTTP, WebSocket) with OAuth; a live McpManager (re)ingests tools into the shared registry, persistent sessions cut per-call latency, failures non-fatal
Interactive TUI (F12) — Textual front-end (transcript, todo/active-tools side pane, status footer, approval/question modal); paints in ~1.6s and is usable while the engine builds on a background thread (F26); slash-commands, live themes (carbon/nord/gruvbox), a context-usage meter (F16), and an MCP control panel (F17, Ctrl+O); session state is a single reactive UIStore, with SVG snapshot tests + a textual serve browser demo for UI peek
Slash commands (F14) — /help /agent /thread /model /theme /think /clear /voice /img /mcp /stop /save /quit dispatched in the driver loop before the LLM (no model call, no transcript pollution)
Chat queue (F13) — in-flight buffer captures mid-turn input; optional background worker pool (chat_once --submit/--status) survives restart
Guardrails — configurable input/output guards (regex injection, ML classifier, PII, Llama Guard content safety); YAML-driven, no code changes to add rules
Agent persona — configurable soul_prompt, system_prompt, and fallback_messages via the agent's agent_config.yaml
Resilient replies — empty reasoning-only turns are re-prompted once, then served a non-blank fallback rather than a blank message
Thinking / reasoning — optional extended reasoning via OPENROUTER_REASONING_EFFORT, switchable live with /think <high|medium|low|off> (F15)
Fast startup — embeddings warmup is off the critical path (F24); the TUI shell + graph build are torch-free, torch loads on a background thread (F26)
Dual backends — dev (SQLite + FAISS, zero infra) / prod (PostgreSQL + pgvector)
Observability — Langfuse and LangSmith tracing, both optional

Quickstart

# 1. Install dependencies — includes the local guardrail backends
#    (Presidio PII + Llama Guard content safety).
uv sync

# 2. (Optional) local voice — faster-whisper STT + Piper TTS (push-to-talk, no cloud key)
uv sync --group voice-local

# Download the local models the installed backends need (spaCy, Llama Guard GGUF,
# and — only if the voice group is installed — the default Piper voice).
# Fetches just what's missing.
python -m app.cli.startup

# 3. Configure environment
cp .env.example .env
# Set OPENROUTER_API_KEY in .env

# 4. (Production only) start PostgreSQL + pgvector
docker compose up -d

# 5. Run
python -m app.main --user alice

Resume a previous session:

python -m app.main --user alice --thread alice_abc12345

Single-shot (JSON output, for programmatic / LLM-driven use):

python -m app.cli.chat_once --user alice --message "Hello"

Environment Variables

Variable	Required	Purpose
`OPENROUTER_API_KEY`	Yes	LLM access via OpenRouter
`OPENROUTER_MODEL`	No	Model override (default: `google/gemini-2.5-flash-lite`)
`OPENROUTER_REASONING_EFFORT`	No	Enable extended reasoning: `low` / `medium` / `high` (leave empty to disable)
`DATABASE_URL`	No	Enables production PostgreSQL + pgvector backends
`LANGFUSE_PUBLIC_KEY` / `LANGFUSE_SECRET_KEY`	No	Langfuse observability
`LANGFUSE_HOST`	No	Langfuse host (default: `https://cloud.langfuse.com`)
`LANGCHAIN_TRACING_V2` / `LANGCHAIN_API_KEY`	No	LangSmith tracing

Commands

uv sync                                                              # install dependencies (incl. local guardrail backends)
uv sync --group voice-local                                          # + faster-whisper STT + Piper TTS (optional local voice)
python -m app.cli.startup                                            # download spaCy + Llama Guard + default Piper voice if absent
python -m app.main --user <USER_ID>                                  # start chat session (interactive agent picker)
python -m app.main --user <USER_ID> --agent <AGENT_ID>              # start with a specific agent
python -m app.main --user <USER_ID> --voice                         # push-to-talk voice I/O (needs voice-local + voice_policy enabled)
python -m app.main --user <USER_ID> --thread <THREAD_ID>            # resume session
python -m app.cli.chat_once --user <USER_ID> --message <MSG>        # single-shot JSON output
python -m app.cli.chat_once --user <USER_ID> --agent <AGENT_ID> --message <MSG>  # single-shot, specific agent
python -m app.cli.manage_agents list                                 # list agents + per-agent on/off state
python -m app.cli.manage_agents add <AGENT_ID> --name NAME          # scaffold a new agent
python -m app.cli.manage_agents remove <AGENT_ID>                   # delete an agent
python -m app.cli.manage_agents enable <AGENT_ID>                   # turn a single agent ON
python -m app.cli.manage_agents disable <AGENT_ID>                  # turn a single agent OFF
python -m app.cli.manage_agents set-default <AGENT_ID>              # set the default agent
python -m app.cli.manage_agents set-model <AGENT_ID> <MODEL>        # set an agent's model (or 'none' to clear)
python -m app.cli.manage_agents add-subagent <AGENT_ID> --tools a,b --namespaces rag --skills x  # scaffold a single-file subagent
python -m app.cli.manage_agents remove-subagent <AGENT_ID>          # delete a subagent definition
python -m app.cli.run_scenarios                                      # replay scenario files in tmp/
python -m app.rag.indexer <path>                                     # index docs (dir, .txt, or .pdf)
python -m skills.loader                                              # index all SKILL.md files
python -m app.tools.indexer                                          # re-index tool descriptions
python -m app.tui.demo                                               # scripted TUI demo (UI peek); textual serve "python -m app.tui.demo" for a browser demo
mypy .                                                               # type check
pytest tests/test_tui_snapshots.py --snapshot-update                 # regenerate TUI SVG snapshot baselines after a UI change
docker compose up -d                                                 # start PostgreSQL + pgvector (prod)
docker compose build pdf-sandbox                                     # build F10 sandbox image (ziro-pdf:latest) for researcher_docker

Visualize the graph

python -m app.cli.show_graph                          # ASCII to stdout (requires grandalf: uv sync --group dev)
python -m app.cli.show_graph --format mermaid         # Mermaid markdown to stdout
python -m app.cli.show_graph --format mermaid -o g.md # Mermaid to file
python -m app.cli.show_graph --format png -o g.png    # PNG (requires graphviz system package)

Slash commands (in-session)

Typed into the running session (TUI or REPL), dispatched before the LLM — no model call, no transcript pollution (F14):

Command	Purpose
`/help` (`/h`)	List commands
`/agent [id]`	Switch agent (rebuilds the session, fresh thread)
`/thread [id]`	Switch / resume a thread
`/model [id]`	Switch the model (rebuilds the LLM)
`/theme [name]`	Switch the UI theme live + persist (carbon / nord / gruvbox)
`/think <high\|medium\|low\|off>`	Set reasoning effort live (F15)
`/clear`	Start a fresh thread (same agent)
`/voice [on\|off]`	Toggle push-to-talk voice I/O (needs `--voice` + `voice_policy`)
`/img <path> [text]`	Attach an image to this turn (F02)
`/mcp [server]`	Show MCP servers (TUI: open the control panel, Ctrl+O)
`/stop` (`/halt`)	Abort the turn currently in flight (Ctrl+S in the TUI)
`/save [path]`	Save the transcript to a JSON file
`/quit` (`/exit`, `/q`)	Exit the session

Architecture

User input → [input_guard] → [compaction] → agent node → LLM with tools → DynamicToolNode → back to agent → ... → [output_guard] → final response

input_guard, output_guard (toggled by the agent's guardrails_policy.yaml), and compaction (toggled by compaction_policy.yaml) are conditional nodes. A blocked request returns a refusal message without reaching the LLM. The agent injects long-term user memories, the configured persona, and the running compaction summary into the system prompt before each LLM call. An empty reasoning-only reply is re-prompted once (reprompt), then replaced by a non-blank fallback (reply_fallback).

Multi-Agent System

The agent can run as one of several named agents, each with its own persona, tool policy, MCP servers, and guardrails.

Per-agent folders — each agent lives in app/agents/<agent_id>/ and is self-contained, with its own agent_config.yaml, tool_policy.yaml, mcp_servers.yaml, guardrails_policy.yaml, compaction_policy.yaml, and meta.yaml. The shipped agents are default (life-coach persona), researcher (factual research aide with lighter guardrails; run_shell enabled in host mode), and researcher_docker (same persona with run_shell in Docker-sandbox mode).
Per-agent model — meta.yaml's model field sets the OpenRouter model for that agent. Omit it (or model: null) to use the OPENROUTER_MODEL env default. A provider: field selects the LLM adapter (default openrouter — see LLM Adapter).
Per-agent on/off — meta.yaml's enabled flag controls whether an agent is selectable. Toggle with manage_agents enable|disable <id>.
Manifest — app/agents/registry.yaml records the default agent. If the default is disabled, the first enabled agent is used.
Memory isolation — long-term memory is keyed per (user_id, agent_id), so each agent keeps its own facts per user.

Progressive Tool Loading

Tools are split into three tiers:

Tier	Description
Core tools	Always bound to the LLM every turn (defined in `tool_policy.yaml` → `core_tools`)
Meta-tools	Always bound; provide the discovery surface (`search_tools`, `load_tools`, `list_tools`, `unload_tools`)
Deferred tools	Registered in the `ToolRegistry` but NOT bound until the LLM calls `load_tools([...])`

Discovery loop: search_tools("<keyword>") → pick names → load_tools([...]) → call the tool. Activated tools persist for the rest of the thread (up to max_active, LRU eviction after that).

MCP Integration

External MCP servers are declared in the agent's mcp_servers.yaml. On startup, tool metadata is fetched from all enabled servers and registered into the ToolRegistry under their server name as namespace. MCP tools enter the same deferred-discovery surface as local tools — they are NOT bound to the LLM until activated via load_tools.

Supported transports: stdio, sse, streamable_http, websocket, plus OAuth (F18). A live McpManager (app/tools/mcp_manager.py) owns per-server state and a ServerStatus snapshot, holds a persistent session per server (one owner task keeps it open so tool calls reuse it — large latency win), and re-ingests tools into the shared registry on every connect so discovery stays current. connect / disconnect / reconnect are driven live from the F17 panel or /mcp. OAuthConfig picks the flow (auto / authorization_code / device_code; device-code when headless); tokens are cached and refreshed per request. Server failures are non-fatal; the rest of the registry is unaffected.

Subagents

The parent can delegate a subtask to a child subagent via spawn_subagent (or dispatch_subagents for parallel fan-out). Each child is a fresh graph on a namespaced thread with its own persona, empty history, and a restricted toolset; only a compact result is folded back into the parent.

Single-file definitions — subagents live in app/agents/.subagents/<id>.agent.md: YAML frontmatter (name, description, enabled, model, allowed tools:/namespaces:, scoped skills:, optional soul) plus a Markdown body that becomes the system prompt. The .subagents/ dir is spawn-only — subagents never appear in the interactive picker. Shipped: scout (research) and reflector (reflection coach).
Permission inheritance — a child's tool rights are the intersection of its declared surface and the parent's current rights; never wider. Namespace tokens (rag, rag:*) expand to the concrete tools available in the (possibly restricted) registry view.
Skill scoping — a subagent's skills: allowlist restricts its search_skills / load_skill_ref to only those skills.
Runaway guards — per-parent subagent_policy.yaml sets enabled, allowed_children/denylist, max_depth (default 1), max_spawns_per_thread, and parallelism limits; each child runs under a recursion_limit.

Scaffold one with manage_agents add-subagent <id> --tools … --namespaces … --skills ….

Web Fetch

An optional web_fetch(url) deferred tool (namespace webfetch) fetches an http/https page and returns cleaned, length-capped text. Gated by the agent's webfetch_policy.yaml — when enabled: false the tool is never registered (invisible to discovery). An SSRF guard resolves the host and refuses loopback / private / link-local / reserved targets, re-checking the final URL after redirects.

Filesystem Tools

Three cross-platform core tools (namespace fs) give the agent Claude-Code-style local file inspection: read_file, grep, and glob_files. They are pure Python stdlib (pathlib / re / os / fnmatch) — no new dependency, no shelling out — so behavior is identical on Windows and POSIX.

read_file(path, offset?, limit?) — read a UTF-8 text file as numbered lines; byte-capped (max_read_bytes) with an offset/limit line window (max_read_lines); binary files are detected and skipped.
grep(pattern, path, glob?, output_mode?, case_insensitive?) — regex search over file contents under a path; glob filters filenames; output_mode is content / files / count; prunes ignored dirs (.git, node_modules, …) and skips binaries; capped at max_grep_matches.
glob_files(pattern, path?) — find files by glob (incl. **), returned repo-relative and mtime-sorted (newest first), capped at max_glob_results.

Every path is resolved against the project root and refused if it escapes (FsConfig.confine, default on). On by default via the agent's fs_policy.yaml (missing file → defaults); added to each shipped agent's core_tools and inherited by subagents through the rights intersection (like web_fetch). No F06 gate — only shell:* is dangerous-default-deny.

Mutating tools (write_file / edit_file) — deferred counterparts registered under a separate fs_write namespace so F06 can ask on writes while reads stay allow; byte-capped at max_write_bytes (512KB). Gated by FsConfig.allow_write (default on) — set false for a read-only agent.

Shell Execution

An optional run_shell(command, cwd?) deferred tool (namespace shell) runs real shell commands — on the host by default, or inside a Docker sandbox via a config flip. The trust model is Claude-Code-like: a human approves what runs.

Default off & default-deny — gated by the agent's shell_policy.yaml (enabled: false → never registered). Even when enabled, shell:* is dangerous-default-deny in the permission system (F06); an agent must explicitly allow-list shell:run_shell in its permissions.yaml. Recommended recipe = allow + ask (human approval on every call via HITL). chat_once auto-denies an ask interrupt (fail-closed), so single-shot runs never hang.
ShellGate backstop — a non-removable hardened denylist (rm -rf, sudo, dd, mkfs, fork bombs, …) plus optional per-agent regex allow/deny and opt-in confine cwd containment.
Two executors — host (create_subprocess_shell, real shell with pipelines) and sandbox (create_subprocess_exec with a pluggable sandbox_launcher template — Docker/WSL/nsjail). Both behind the same gate, audit, and tool body.
Bash parity — ShellConfig.interpreter defaults to "bash": the command runs as one argv element via create_subprocess_exec (no shell-launcher involved), so bash tokenizes pipes/quoting/heredocs identically on every OS. On Windows this probes Git Bash then falls back to wsl bash — the previous approach (pinning shell_executable through create_subprocess_shell) was unreliable there, since Windows always runs {executable or COMSPEC} /c {command} regardless of the executable. interpreter: "host" restores the old cmd.exe//bin/sh behavior. An optional persistent per-thread bash -l session (session: "persistent") keeps cd/env vars/venv activation across calls.
Always-on safety — timeout_s with process-tree kill, output capped twice (at the source and at ingestion), and env scrubbing so secrets (OPENROUTER_API_KEY, DATABASE_URL, LANGFUSE_*) never reach the child.
Loop guards — pre_tool hooks (app/hooks/guards.py) flag repeated near-identical run_shell commands (shell_loop_guard) and repeated identical failure signatures across different commands (verification_loop_guard — pytest/mypy/tsc/panic tracebacks recurring while the fix isn't landing).
Audit — pre_tool / post_tool hooks log every invocation (intent + outcome, including blocked/timeout) to data/audit/shell-audit.jsonl (dev) or a tool_audit table (prod). See docs/audit-log.md.

Two agents ship the opt-in: researcher (host mode; its persona bakes a map-reduce flow for summarizing large PDFs without overflowing context) and researcher_docker (mode: sandbox, launcher docker run --rm --network none -v {workdir}:/work -w /work {image}, image ziro-pdf:latest — build with docker compose build pdf-sandbox).

Local Voice

Optional push-to-talk speech I/O (F11): the turn is bracketed by STT (your speech → the user message) and TTS (the post-output-guard reply → audio). Default backends are fully local — faster-whisper for STT, Piper for TTS — so no cloud API key is needed (cloud STT/TTS over httpx are also pluggable per agent). Gated by the agent's voice_policy.yaml (enabled: false by default).

Setup is two commands:

uv sync --group voice-local   # faster-whisper STT + Piper TTS (+ sounddevice/PortAudio)
python -m app.cli.startup     # downloads the default Piper voice into data/voices/ (gitignored)

Then enable it in the agent's app/agents/<id>/voice_policy.yaml:

enabled: true
tts_backend: local
tts_model: data/voices/en_US-amy-medium.onnx   # path to the downloaded .onnx (config .onnx.json resolved alongside)

Run with python -m app.main --user <id> --voice, or toggle /voice on in the TUI (Ctrl+R to start/stop recording). Piper voices come from the Rhasspy catalog (<lang>-<name>-<quality>); download others with python -m piper.download_voices <voice-id> --download-dir data/voices and point tts_model at the new .onnx. The researcher agent ships voice-enabled.

Tool Permissions (F06)

Every tool call passes a per-agent permission gate (app/permissions/) wired as a pre_tool hook. The agent's permissions.yaml declares allow / deny / ask globs over tool/namespace qualified names plus a default_action; dangerous_default_deny (ships ["shell:*"]) forces a default-deny for dangerous namespaces. An ask decision raises a human-in-the-loop interrupt (the TUI ApprovalModal, REPL prompt, or auto-deny in chat_once); approvals can be remembered once, for the thread, or always (remember_default_scope, persisted per user/agent). Tool args are redacted (secrets masked, long values truncated) before display.

Hooks (F01)

app/hooks/ is a declarative lifecycle-interception layer. An agent's hooks.yaml lists HookSpec entries — each binds a HookEvent (session_start, pre_turn, post_turn, pre_tool, post_tool, pre_model, post_model), an fnmatch matcher (over tool qualified names for tool events), and a callable (Python dotted path or shell command). Hooks return a HookDecision (allow / deny / modify / interrupt); the runner short-circuits on deny/interrupt and folds modify. Hooks are ordered once at startup, so a no-hook registry has zero per-turn cost. This path powers the F06 permission gate and the F10 shell audit (*run_shell pre/post_tool).

Human Handoff (F07)

An optional request_handoff(reason) core tool (app/handoff/) lets the agent escalate to a human. It pauses the turn, surfaces the reason to an operator, accepts a human-authored reply, and resumes with that reply injected as the next message (assistant or tool, per config). The agent's handoff_policy.yaml (HandoffConfig) sets enabled, the operator prompt, a timeout with an on_timeout fallback, and inject_as.

Clarifying Questions

A core ask_user_question(questions) tool (app/clarify/) mirrors Claude Code's AskUserQuestion: pauses mid-turn to ask 1-4 structured clarifying questions (2-4 options each, optional multi-select, implicit free-text "Other"), then folds the human's answers back into the tool result. It shares the same HITL interrupt module as F06/F07 (app/graph/interrupts.py — InterruptKind.ASK_USER_QUESTION) rather than a bespoke pause mechanism; the TUI renders a QuestionModal, the REPL prints numbered options. Gated by clarify_policy.yaml (ClarifyConfig — enabled, max_questions, max_options; missing file → on by default).

Task / Todo List (F08)

app/tasks/ adds state-based todo tracking via two core tools: write_todos(todos) (full rewrite) and update_todo(id, status|content|result). Todos live in AgentState.todos (merged by a per-id reducer), survive checkpointing, and render live in the TUI side pane. A todo can carry an agent_id to mark subagent delegation.

Multimodal Input (F02)

Images attach via the /img <path> directive or the CLI --attach flag (app/io/attachments.py). build_human_message checks supports_multimodal(model): vision-capable models receive base64 data-URI image blocks in a multimodal HumanMessage; text-only models get a graceful notice instead. Bounded by the agent's attachment_policy.yaml (enabled, max_image_bytes, allowed_image_types).

Interactive TUI & Themes (F12 / F16 / F17)

app/tui/ is a Textual front-end over the shared TurnRunner: a transcript, a todo/active-tools side pane, a status footer, and an ApprovalModal rendering the shared HITL interrupt. It paints in ~1.6s and is usable immediately while the engine builds on a background thread — any turn typed early is queued and runs in order (F26). Extras:

Themes — app/tui/theme.py maps semantic color slots (not literal hex) to Textual + Rich widgets. Carbon (amber + steel on carbon black) is the default; nord and gruvbox also ship. Palettes are discovered from three roots (bundled app/tui/themes/, project ./themes/, home ~/.ziro/themes/); /theme [name] swaps live and persists to data/ui_prefs.json.
Context meter (F16) — a ContextMeter widget reveals at ≥50% of the usable input budget and shades muted → amber → red as the request approaches the compaction trigger, with a trip-line marker at the trigger line.
MCP control panel (F17) — Ctrl+O (or /mcp in the TUI) opens a modal McpPanel: a live table of servers (state, transport, auth, tool count) with Connect / Disconnect / Reconnect, tool peek, and an OAuth device-code prompt.
Reactive state + UI peek — session state (identity/lifecycle/content/chrome) lives in one reactive UIStore (app/tui/store.py) mutated only through typed messages (app/tui/messages.py); widgets bind via watch() instead of imperative update_*() calls. python -m app.tui.demo (or textual serve "python -m app.tui.demo" for a browser demo) drives a scripted fake runner for visual review; SVG snapshot regression tests live in tests/test_tui_snapshots.py (pytest --snapshot-update to refresh baselines). See docs/tui-peek.md.

LLM Adapter

app/llm/ wraps LangChain's Runnable behind an LLMAdapter ABC (invoke/ainvoke/stream/astream, bind_tools(), context_window()/supports()), decoupling the graph from OpenRouter specifically. app/llm/factory.py resolves a provider (meta.yaml's provider: field, default openrouter) to an adapter — only OpenrouterAdapter ships today, but unknown providers fail fast rather than silently falling back, and get_llm()/get_agent_llm()/get_summary_llm() in app/core/config.py are now thin wrappers over build_adapter() rather than constructing ChatOpenAI directly. This is step one toward multi-provider support.

Context Compaction

Each agent has a compaction_policy.yaml. When a turn's request exceeds trigger_pct of the model's usable input budget (window minus reserved output and schema headroom), the compaction node folds older messages into a running summary (injected into the system prompt) and drops them from live history via RemoveMessage — shrinking both the in-flight request and the persisted checkpoint. The keep_recent_min most-recent messages are always kept verbatim, and the split is chosen to never orphan a ToolMessage from its parent call.

Budget math is model-aware: app/llm/openrouter_catalog.py (formerly app/core/model_specs.py) reads the OpenRouter /models catalog once per process for each model's context_length, max_completion_tokens, and supported_parameters, so the trigger, the reserved-output carve-out, and the max_tokens sent to the LLM stay consistent. Strategy is hybrid (summarize the dropped span via a clean tool-free LLM) or trim (drop only). Setting enabled: false restores un-compacted behavior exactly. A separate max_tool_message_tokens / max_tool_message_pct cap bounds a single oversized ToolMessage at ingestion. See docs/chat-compression.md.

Tools available to the LLM

Tool	Tier	Purpose
`search_rag(query)`	Core	Semantic search over indexed documents
`search_skills(query)`	Core	Semantic search over SKILL.md files
`load_skill_ref(skill_name, filename)`	Core	Load a reference/script file from a skill directory on demand
`save_memory(content)`	Core	Persist a user fact to long-term store
`search_tools(query)`	Meta	Find available deferred tools by keyword
`load_tools(names)`	Meta	Activate deferred tools for this thread
`list_tools(namespace)`	Meta	Browse all tools by namespace
`unload_tools(names)`	Meta	Deactivate tools to free context
`write_todos(todos)`	Core	Rewrite the turn's todo list (F08); renders in the TUI side pane
`update_todo(id, …)`	Core	Update one todo's status/content/result (F08)
`request_handoff(reason)`	Core	Pause the turn for a human operator and inject their reply (F07); gated by `handoff_policy.yaml`
`ask_user_question(questions)`	Core	Pause the turn to ask 1-4 structured clarifying questions (2-4 options, optional multi-select); gated by `clarify_policy.yaml`
`spawn_subagent(agent_id, task)`	Deferred	Delegate a self-contained subtask to a child subagent in isolated context; returns one concise result
`dispatch_subagents(tasks)`	Deferred	Parallel fan-out (gated by `enable_parallel`); one result block per child
`get_subagent_transcript(run_id)`	Deferred	Pull a past child run's full transcript on demand (capped)
`read_file(path, offset?, limit?)`	Core	Read a project text file as numbered lines; byte/line capped, binary-safe, path-confined (`fs`)
`grep(pattern, path, glob?, output_mode?, case_insensitive?)`	Core	Regex search over file contents (content/files/count); skips ignored dirs + binaries (`fs`)
`glob_files(pattern, path?)`	Core	Find files by glob, mtime-sorted, repo-relative, path-confined (`fs`)
`write_file(path, content)` / `edit_file(path, …)`	Deferred	Mutating file writes, byte-capped, path-confined; separate `fs_write` namespace so writes can `ask` while reads stay `allow`; gated by `fs_policy.yaml` `allow_write`
`web_fetch(url)`	Deferred	Fetch an http/https page, return cleaned text; SSRF-guarded; gated by `webfetch_policy.yaml`
`run_shell(command, cwd?)`	Deferred	Run a shell command on the host or a sandbox container; default-deny + HITL, denylist/timeout/output-cap/env-scrub; gated by `shell_policy.yaml`
deferred tools	Deferred	Any tool registered in the registry (local or MCP)

Runtime Modes

Mode	Trigger	Store	Checkpointer	Vector
Dev	No `DATABASE_URL`	`SqliteStore` (`./data/memories.db`)	`AsyncSqliteSaver` (`./data/checkpoints.db`)	FAISS
Prod	`DATABASE_URL` set	`PostgresStore`	`AsyncPostgresSaver`	pgvector

Key Modules

Module	Role
`app/graph/graph.py`	Builds LangGraph state machine; wires nodes + routing (including guardrail nodes)
`app/graph/nodes.py`	`make_agent_node()`, `make_dynamic_tool_node()`, `make_memory_tools()`
`app/graph/state.py`	`AgentState` TypedDict — `messages`, `guardrail_*`, `active_tools` (+`ReplaceActiveTools` for LRU), `running_summary`, `last_compaction_index`; `extract_text()` for reasoning-block content
`app/core/config.py`	Reads `.env`; LLM factories `get_llm()` / `get_agent_llm()` / `get_summary_llm()` — thin wrappers over `app.llm.factory.build_adapter()` (model-aware `max_tokens`), embeddings, Langfuse handler; `load_*` for agent/tool/mcp/compaction configs
`app/core/agent_profiles.py`	`AgentProfile` — resolves per-agent config files + model; `select_profile()`, `list_agent_profiles()`, `all_agent_profiles()`, `get_agent_profile()`; subagent resolution (`SUBAGENTS_DIR`, `*_subagent_profile(s)`)
`app/core/agent_md.py`	Parses single-file `*.agent.md` subagent definitions (frontmatter + Markdown body)
`app/llm/openrouter_catalog.py`	OpenRouter `/models` catalog cache (renamed from `app/core/model_specs.py`): `context_length`, `max_completion_tokens`, `supported_parameters`, `provider`; `supports_parameter()` gating, `list_model_ids(provider=)`/`list_providers()`
`app/llm/adapter.py`	`LLMAdapter` ABC wrapping a LangChain `Runnable` — `invoke`/`ainvoke`/`stream`/`astream`, `bind_tools()`, `build()` classmethod
`app/llm/factory.py`	`build_adapter(provider, ...)` — provider registry (only `openrouter` shipped; unknown provider errors)
`app/core/paths.py`	`PROJECT_ROOT` + resolved dirs (`AGENTS_DIR`, `DATA_DIR`, `SKILLS_DIR`) anchored to the repo root
`app/compaction/`	`node.py` (`make_compaction_node`, `pick_split`), `window.py` (budget/trigger math), `summarizer.py`, `tokenizer.py`, `models.py` (`CompactionConfig` / `CompactionResult`)
`app/memory/store.py`	`save_memory` / `load_memories` — user-scoped long-term facts
`app/memory/checkpointer.py`	Thread-level checkpoints (enables `--thread` resumption)
`app/rag/retriever.py`	`search_rag`, `search_skills`, `load_skill_ref` tools; lazy FAISS/PGVector loading with module-level cache
`app/rag/indexer.py`	Chunking (1000 tokens, 200 overlap), embedding (HuggingFace `all-MiniLM-L6-v2`), indexing
`app/tools/registry.py`	`ToolRegistry` — unified source of truth for all tools; semantic + keyword search; `ToolDescriptor`; `expand()` (namespace tokens → tools); `view()` restricted facade
`app/tools/meta_tools.py`	`search_tools`, `load_tools`, `list_tools`, `unload_tools` — the progressive-loading discovery surface
`app/tools/bootstrap.py`	`build_local_registry()` / `ingest_mcp_tools()` — startup wiring of local + MCP tools
`app/tools/mcp_client.py`	Connects to MCP servers from agent `mcp_servers.yaml`; fetches + allowlist-filters tools
`app/tools/indexer.py`	CLI to re-index tool descriptions into the `tools` vector-store collection
`app/guardrails/models.py`	Pydantic models: `PolicyConfig`, `PolicyRule`, `GuardrailDecision`, backend configs
`app/guardrails/backends.py`	`RegexInjectionRunnable`, `LocalClassifierRunnable`, `PresidioRunnable`, `LlamaGuardRunnable`, `LLMGuardrailRunnable`; `make_backend()` factory
`app/guardrails/evaluator.py`	`GuardrailEvaluator` — groups rules by backend, runs backends concurrently
`app/guardrails/nodes.py`	`make_input_guard_node()` / `make_output_guard_node()` LangGraph node factories
`app/guardrails/policy_loader.py`	`load_policies()` — reads an agent's `guardrails_policy.yaml`
`app/cli/chat_once.py`	Single-shot invocation; outputs one JSON line (for programmatic / LLM-driven use)
`app/cli/run_scenarios.py`	Replay scenario JSON files through the agent; writes side-by-side transcripts
`app/cli/manage_agents.py`	Scaffold, list, enable/disable, and configure agents
`app/cli/show_graph.py`	Visualize the LangGraph state machine (ASCII, Mermaid, PNG)
`app/subagents/`	`models.py` (`SpawnPolicy` etc.), `orchestrator.py` (`SpawnContext`, graph cache, rights intersection, skill-scope threading), `tool.py` (`make_subagent_tools`)
`app/permissions/`	F06: `models.py` (`PermissionPolicy`/`PermissionDecision`/`PermissionRequest`), `gate.py`/`policy.py` (evaluate + arg redaction), `hook.py` (pre_tool wiring), `store.py` (durable grants)
`app/hooks/`	F01: `models.py` (`HookEvent`/`HookSpec`/`HookContext`/`HookDecision`), `registry.py`, `runner.py`, `callables.py` (Python/shell callables), `guards.py` (`shell_loop_guard`/`verification_loop_guard` — repeated-command / repeated-failure loop detection)
`app/graph/interrupts.py`	Shared HITL interrupt module for F06/F07/clarify: `InterruptKind`, `raise_interrupt()`, `InterruptRequest`/`InterruptResponse`, `render_interactive()`
`app/handoff/`	F07: `models.py` (`HandoffConfig`), `tools.py` (`request_handoff`); gated by `handoff_policy.yaml`
`app/clarify/`	`ask_user_question` core tool: `models.py` (`ClarifyConfig`), `tools.py` (`make_clarify_tools`); gated by `clarify_policy.yaml`
`app/tasks/`	F08: `models.py` (`Todo`/`TodoStatus`), `tools.py` (`write_todos`/`update_todo`), `reducer.py`, `render.py`
`app/io/attachments.py`	F02: `AttachmentConfig`, `parse_attachments`, `to_image_block`, `build_human_message` (multimodal vs text-only)
`app/tui/theme.py` + `themes/`	Theme palettes (`Palette`, carbon/nord/gruvbox, `discover_palettes`, live `set_active`); `app/core/ui_prefs.py` persists the choice
`app/tui/mcp_panel.py`	F17: `McpPanel` modal + `OAuthPromptModal` (device-code prompt)
`app/tui/store.py`	`UIStore` — single non-visual reactive state holder (identity/lifecycle/content/chrome) mounted on the App
`app/tui/messages.py`	`SessionSwitched`/`BusyChanged`/`StartupTicked`/`StateChanged`/`UsageChanged`/`ThemeChanged` — the only vocabulary that mutates `UIStore`
`app/tui/demo.py`	`ScriptedRunner` (torch-free fake `TurnRunner`) + `build_demo_app()` — UI peek via `python -m app.tui.demo` / `textual serve`
`app/tools/mcp_manager.py` + `mcp_models.py`	F18: `McpManager` (connect/disconnect/reconnect, persistent sessions, re-ingest), `ServerStatus`/`ToolSpec`/`OAuthConfig`
`app/webfetch/`	`models.py` (`WebFetchConfig`), `tool.py` (`web_fetch`, SSRF guard, HTML→text); gated by `webfetch_policy.yaml`
`app/fs/`	`models.py` (`FsConfig` — read caps + `allow_write`/`max_write_bytes`), `tools.py` (`make_fs_tools` — `read_file`/`grep`/`glob_files`; `make_fs_write_tools` — `write_file`/`edit_file` under the separate `fs_write` namespace; `_resolve` path confinement); pure stdlib, cross-platform; gated by `fs_policy.yaml` (on by default)
`app/tools/shell.py` + `shell_models.py`	F10/F27 shell: `ShellGate`, host + sandbox executors, `make_shell_tool`, output caps, process-tree kill; `_resolve_bash()` (Git Bash/WSL probing) + `PersistentBashSession`; `ShellConfig`/`GateDecision`
`app/tools/shell_audit.py`	F10 audit hooks (`log_intent`/`log_outcome`) → `data/audit/shell-audit.jsonl` / `tool_audit` table
`docker/pdf-sandbox/Dockerfile`	F10 sandbox image (`python:3.12-slim` + pypdf + reportlab); built via the `pdf-sandbox` compose service
`skills/loader.py`	Walks `./skills/` for `SKILL.md` files; indexes them as the "skills" collection
`app/agents/<id>/agent_config.yaml`	Per-agent `system_prompt`, `soul_prompt`, `fallback_messages`
`app/agents/<id>/tool_policy.yaml`	Per-agent `core_tools`, `search_k`, `max_active`, `denylist`
`app/agents/<id>/mcp_servers.yaml`	Per-agent MCP server declarations (transport, command/url, allowlist, enabled flag)
`app/agents/<id>/guardrails_policy.yaml`	Per-agent named backends pool + input/output rules
`app/agents/<id>/compaction_policy.yaml`	Per-agent compaction policy (`enabled`, trigger/target band, retention, summarizer)
`app/agents/<id>/subagent_policy.yaml`	Per-parent spawn policy (`enabled`, `allowed_children`/`denylist`, `max_depth`, spawn/parallel caps)
`app/agents/<id>/webfetch_policy.yaml`	Per-agent web-fetch policy (`enabled` master flag, SSRF/scheme/host controls, content cap)
`app/agents/<id>/fs_policy.yaml`	Per-agent filesystem-tools policy (`enabled`, `confine`, read byte/line caps, grep/glob result caps, `ignore_dirs`)
`app/agents/<id>/shell_policy.yaml`	Per-agent shell policy (`enabled`, `mode` host/sandbox, denylist/allowlist, timeout, output cap, env passthrough, sandbox launcher/image)
`app/agents/<id>/permissions.yaml`	Per-agent F06 tool-permission policy (allow/deny/ask globs; `shell:*` default-deny; remember scope)
`app/agents/<id>/hooks.yaml`	Per-agent F01 lifecycle hooks (events, glob matcher, Python/shell callable; e.g. `*run_shell` audit pre/post_tool)
`app/agents/<id>/handoff_policy.yaml`	Per-agent F07 human-handoff policy (`enabled`, operator prompt, timeout/fallback, `inject_as`)
`app/agents/<id>/attachment_policy.yaml`	Per-agent F02 multimodal policy (`enabled`, `max_image_bytes`, `allowed_image_types`)
`app/agents/<id>/memory_policy.yaml`	Per-agent F03 self-improving-memory policy (fact extraction / reflection)
`app/agents/<id>/voice_policy.yaml`	Per-agent F11 voice policy (`enabled`, STT/TTS backend + model)
`app/agents/<id>/queue_policy.yaml`	Per-agent F13 queue policy (in-flight always on; background worker default off)
`app/agents/<id>/meta.yaml`	Agent `name`, `description`, `enabled`, `model`
`app/agents/.subagents/<id>.agent.md`	Single-file subagent definition (frontmatter persona/tools/skills + Markdown system prompt)
`app/agents/registry.yaml`	Records the default agent id

Guardrails

Policies are declared in the agent's guardrails_policy.yaml. Each rule references a named backend from the backends: pool. Multiple backends run concurrently; first block in declaration order wins. Rules support refusal_templates lists for randomized refusals.

Backend type	Description
`regex_injection`	Deterministic OWASP prompt-injection patterns + evasion detection (typoglycemia, Base64/hex, char-spacing); zero dependencies
`local_classifier`	HuggingFace text-classification (prompt injection detection)
`presidio`	Microsoft Presidio PII detection (bundled in core deps; spaCy model via `python -m app.cli.startup`)
`llama_guard`	Llama Guard 3 1B GGUF content safety; supports `block_categories` to scope which S-codes block (bundled in core deps; GGUF via `python -m app.cli.startup`)
`openai`	Any OpenAI-compatible API for LLM-based policy evaluation

Extending

Add documents:

python -m app.rag.indexer ./my-docs/

Add a skill:

mkdir -p skills/my-skill/references
# create skills/my-skill/SKILL.md
# (optional) add reference files to skills/my-skill/references/
python -m skills.loader

Add an agent:

python -m app.cli.manage_agents add my-agent --name "My Agent"
# Edit app/agents/my-agent/agent_config.yaml, tool_policy.yaml, etc.

Add a subagent:

python -m app.cli.manage_agents add-subagent my-scout --tools rag:search_rag --namespaces rag --skills deep-research
# Edit app/agents/.subagents/my-scout.agent.md (frontmatter tools/namespaces/skills + persona body)

Add an MCP server: add an entry to the agent's app/agents/<id>/mcp_servers.yaml, restart the agent.

Add a guardrail rule: edit the agent's app/agents/<id>/guardrails_policy.yaml — add a backend entry and a rule referencing it. No code changes needed.

Change default model:

OPENROUTER_MODEL=anthropic/claude-sonnet-4-5

Change model for a specific agent:

python -m app.cli.manage_agents set-model <AGENT_ID> anthropic/claude-sonnet-4-5

Enable reasoning:

OPENROUTER_REASONING_EFFORT=medium

Configure agent persona: edit app/agents/<id>/agent_config.yaml (system_prompt, soul_prompt, fallback_messages).

Project details

Release history Release notifications | RSS feed

This version

0.1.1

Jul 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ziro-0.1.1.tar.gz (1.8 MB view details)

Uploaded Jul 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ziro-0.1.1-py3-none-any.whl (962.2 kB view details)

Uploaded Jul 3, 2026 Python 3

File details

Details for the file ziro-0.1.1.tar.gz.

File metadata

Download URL: ziro-0.1.1.tar.gz
Upload date: Jul 3, 2026
Size: 1.8 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ziro-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`e0348e73dae7f2f7317a36c85c32a79dcb75c7a44f3881c6fafeedaed7defec3`
MD5	`926c4522aa3cf37ff3aa67d4d9d661a3`
BLAKE2b-256	`6452517d9358da88796fa5f77aa1d7a97e3e196cefaaaf384deb25fbd54566e4`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ziro-0.1.1.tar.gz:

Publisher: release.yml on hRupanjan/ziro

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ziro-0.1.1.tar.gz
- Subject digest: e0348e73dae7f2f7317a36c85c32a79dcb75c7a44f3881c6fafeedaed7defec3
- Sigstore transparency entry: 2064025316
- Sigstore integration time: Jul 3, 2026
Source repository:
- Permalink: hRupanjan/ziro@c5e1a12ca24918f3a1361fad2023c758f844bf20
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/hRupanjan
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@c5e1a12ca24918f3a1361fad2023c758f844bf20
- Trigger Event: release

File details

Details for the file ziro-0.1.1-py3-none-any.whl.

File metadata

Download URL: ziro-0.1.1-py3-none-any.whl
Upload date: Jul 3, 2026
Size: 962.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ziro-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`01e0e93ff7d89eef15e4055c3fe9fbcf88d621cea8a0481876f684f77b9d1a38`
MD5	`659df91f34b181dc5527eb08408269d3`
BLAKE2b-256	`2909ed0b9485b9f7d8695ab3b14f6497cac7288f8b0c2e6153ca5664f2107076`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ziro-0.1.1-py3-none-any.whl:

Publisher: release.yml on hRupanjan/ziro

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ziro-0.1.1-py3-none-any.whl
- Subject digest: 01e0e93ff7d89eef15e4055c3fe9fbcf88d621cea8a0481876f684f77b9d1a38
- Sigstore transparency entry: 2064025340
- Sigstore integration time: Jul 3, 2026
Source repository:
- Permalink: hRupanjan/ziro@c5e1a12ca24918f3a1361fad2023c758f844bf20
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/hRupanjan
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@c5e1a12ca24918f3a1361fad2023c758f844bf20
- Trigger Event: release

ziro 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Ziro

Features

Quickstart

Environment Variables

Commands

Visualize the graph

Slash commands (in-session)

Architecture

Multi-Agent System

Progressive Tool Loading

MCP Integration

Subagents

Web Fetch

Filesystem Tools

Shell Execution

Local Voice

Tool Permissions (F06)

Hooks (F01)

Human Handoff (F07)

Clarifying Questions

Task / Todo List (F08)

Multimodal Input (F02)

Interactive TUI & Themes (F12 / F16 / F17)

LLM Adapter

Context Compaction

Tools available to the LLM

Runtime Modes

Key Modules

Guardrails

Extending

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance