Skip to main content

Deep LangGraph agent with memory, RAG, skills, OpenRouter, and Langfuse

Project description

Ziro

LangGraph conversational agent with persistent memory, RAG, progressive tool loading, MCP support, skills library, configurable guardrails, and automatic context compaction. Backed by OpenRouter (any LLM) and Langfuse (observability). Two runtime modes: SQLite/FAISS for local dev, PostgreSQL/pgvector for production.

Features

  • Multi-agent — run as one of several named agents, each with its own persona, model, tools, MCP servers, guardrails, and compaction policy; per-agent on/off flags
  • Long-term memory — user- and agent-scoped facts persisted across sessions
  • Thread resumption — pick up any prior conversation by thread ID
  • Context compaction — older turns are auto-folded into a running summary when a request nears the model's context window; recent turns kept verbatim; YAML-driven, model-aware budgeting
  • RAG — semantic search over indexed documents (.txt, .pdf, directories)
  • Skills library — index SKILL.md files; agent retrieves relevant skills and loads reference files on demand
  • Progressive tool loading — tools are deferred by default; the LLM discovers and activates only what it needs via search_tools / load_tools
  • Subagents — delegate a self-contained subtask to a child subagent in isolated context; subagents are single-file *.agent.md definitions (persona + scoped tools/skills) with inherited, never-widened permissions
  • Tool permissions (F06) — per-agent allow/deny/ask policy over tool/namespace globs; ask triggers a human-in-the-loop approval, with allow-once / allow-thread / allow-always memory; shell:* ships dangerous-default-deny
  • Hooks (F01) — declarative lifecycle interception (session_start, pre/post_turn, pre/post_tool, pre/post_model); Python-dotted or shell callables with glob matchers; powers permission gating, shell audit, memory reflection
  • Human handoff (F07) — optional request_handoff tool that pauses the turn for a human operator and injects their reply back into the conversation
  • Clarifying questions — core ask_user_question tool (mirrors Claude Code's AskUserQuestion): pauses the turn to ask 1-4 structured multiple-choice questions and folds the answers back in; shares the same HITL interrupt path as F06/F07
  • Task / todo list (F08)write_todos / update_todo state tools; the running todo list renders in the TUI side pane
  • Self-improving memory (F03) — per-agent memory_policy.yaml governs how facts are extracted and reflected into long-term store
  • Multimodal input (F02) — attach images via /img <path> or --attach; vision-capable models receive image blocks, text-only models degrade gracefully (attachment_policy.yaml)
  • Filesystem tools — cross-platform core tools read_file, grep, glob_files for local file inspection, plus deferred, ask-gated write_file/edit_file mutating tools (separate fs_write namespace); pure stdlib (identical on Windows + POSIX), path-confined to the project root, on by default (fs_policy.yaml)
  • Web fetch — optional SSRF-guarded web_fetch(url) tool that returns cleaned, length-capped page text
  • Shell execution — optional run_shell tool that runs real commands on the host or in a Docker sandbox; default-deny + human approval (F06/HITL), denylist backstop, timeout + process-tree kill, output caps, env scrubbing, and full audit.
  • Bash parity — a cross-platform bash interpreter (Git Bash/WSL-probed on Windows) makes pipes/quoting/heredocs behave identically on every OS; optional persistent per-thread bash session keeps cd/env vars across calls
  • LLM adapterLLMAdapter abstraction (app/llm/) decouples the graph from OpenRouter specifically, the first step toward multi-provider support; per-agent provider: selection in meta.yaml
  • Local voice — optional push-to-talk speech I/O (F11): faster-whisper STT + Piper TTS, fully on-device, no cloud key required (cloud STT/TTS also pluggable)
  • MCP support (F18) — connect external MCP servers (stdio, SSE, streamable HTTP, WebSocket) with OAuth; a live McpManager (re)ingests tools into the shared registry, persistent sessions cut per-call latency, failures non-fatal
  • Interactive TUI (F12) — Textual front-end (transcript, todo/active-tools side pane, status footer, approval/question modal); paints in ~1.6s and is usable while the engine builds on a background thread (F26); slash-commands, live themes (carbon/nord/gruvbox), a context-usage meter (F16), and an MCP control panel (F17, Ctrl+O); session state is a single reactive UIStore, with SVG snapshot tests + a textual serve browser demo for UI peek
  • Slash commands (F14)/help /agent /thread /model /theme /think /clear /voice /img /mcp /stop /save /quit dispatched in the driver loop before the LLM (no model call, no transcript pollution)
  • Chat queue (F13) — in-flight buffer captures mid-turn input; optional background worker pool (chat_once --submit/--status) survives restart
  • Guardrails — configurable input/output guards (regex injection, ML classifier, PII, Llama Guard content safety); YAML-driven, no code changes to add rules
  • Agent persona — configurable soul_prompt, system_prompt, and fallback_messages via the agent's agent_config.yaml
  • Resilient replies — empty reasoning-only turns are re-prompted once, then served a non-blank fallback rather than a blank message
  • Thinking / reasoning — optional extended reasoning via OPENROUTER_REASONING_EFFORT, switchable live with /think <high|medium|low|off> (F15)
  • Fast startup — embeddings warmup is off the critical path (F24); the TUI shell + graph build are torch-free, torch loads on a background thread (F26)
  • Dual backends — dev (SQLite + FAISS, zero infra) / prod (PostgreSQL + pgvector)
  • Observability — Langfuse and LangSmith tracing, both optional

Quickstart

# 1. Install dependencies — includes the local guardrail backends
#    (Presidio PII + Llama Guard content safety).
uv sync

# 2. (Optional) local voice — faster-whisper STT + Piper TTS (push-to-talk, no cloud key)
uv sync --group voice-local

# Download the local models the installed backends need (spaCy, Llama Guard GGUF,
# and — only if the voice group is installed — the default Piper voice).
# Fetches just what's missing.
python -m app.cli.startup

# 3. Configure environment
cp .env.example .env
# Set OPENROUTER_API_KEY in .env

# 4. (Production only) start PostgreSQL + pgvector
docker compose up -d

# 5. Run
python -m app.main --user alice

Resume a previous session:

python -m app.main --user alice --thread alice_abc12345

Single-shot (JSON output, for programmatic / LLM-driven use):

python -m app.cli.chat_once --user alice --message "Hello"

Environment Variables

Variable Required Purpose
OPENROUTER_API_KEY Yes LLM access via OpenRouter
OPENROUTER_MODEL No Model override (default: google/gemini-2.5-flash-lite)
OPENROUTER_REASONING_EFFORT No Enable extended reasoning: low / medium / high (leave empty to disable)
DATABASE_URL No Enables production PostgreSQL + pgvector backends
LANGFUSE_PUBLIC_KEY / LANGFUSE_SECRET_KEY No Langfuse observability
LANGFUSE_HOST No Langfuse host (default: https://cloud.langfuse.com)
LANGCHAIN_TRACING_V2 / LANGCHAIN_API_KEY No LangSmith tracing

Commands

uv sync                                                              # install dependencies (incl. local guardrail backends)
uv sync --group voice-local                                          # + faster-whisper STT + Piper TTS (optional local voice)
python -m app.cli.startup                                            # download spaCy + Llama Guard + default Piper voice if absent
python -m app.main --user <USER_ID>                                  # start chat session (interactive agent picker)
python -m app.main --user <USER_ID> --agent <AGENT_ID>              # start with a specific agent
python -m app.main --user <USER_ID> --voice                         # push-to-talk voice I/O (needs voice-local + voice_policy enabled)
python -m app.main --user <USER_ID> --thread <THREAD_ID>            # resume session
python -m app.cli.chat_once --user <USER_ID> --message <MSG>        # single-shot JSON output
python -m app.cli.chat_once --user <USER_ID> --agent <AGENT_ID> --message <MSG>  # single-shot, specific agent
python -m app.cli.manage_agents list                                 # list agents + per-agent on/off state
python -m app.cli.manage_agents add <AGENT_ID> --name NAME          # scaffold a new agent
python -m app.cli.manage_agents remove <AGENT_ID>                   # delete an agent
python -m app.cli.manage_agents enable <AGENT_ID>                   # turn a single agent ON
python -m app.cli.manage_agents disable <AGENT_ID>                  # turn a single agent OFF
python -m app.cli.manage_agents set-default <AGENT_ID>              # set the default agent
python -m app.cli.manage_agents set-model <AGENT_ID> <MODEL>        # set an agent's model (or 'none' to clear)
python -m app.cli.manage_agents add-subagent <AGENT_ID> --tools a,b --namespaces rag --skills x  # scaffold a single-file subagent
python -m app.cli.manage_agents remove-subagent <AGENT_ID>          # delete a subagent definition
python -m app.cli.run_scenarios                                      # replay scenario files in tmp/
python -m app.rag.indexer <path>                                     # index docs (dir, .txt, or .pdf)
python -m skills.loader                                              # index all SKILL.md files
python -m app.tools.indexer                                          # re-index tool descriptions
python -m app.tui.demo                                               # scripted TUI demo (UI peek); textual serve "python -m app.tui.demo" for a browser demo
mypy .                                                               # type check
pytest tests/test_tui_snapshots.py --snapshot-update                 # regenerate TUI SVG snapshot baselines after a UI change
docker compose up -d                                                 # start PostgreSQL + pgvector (prod)
docker compose build pdf-sandbox                                     # build F10 sandbox image (ziro-pdf:latest) for researcher_docker

Visualize the graph

python -m app.cli.show_graph                          # ASCII to stdout (requires grandalf: uv sync --group dev)
python -m app.cli.show_graph --format mermaid         # Mermaid markdown to stdout
python -m app.cli.show_graph --format mermaid -o g.md # Mermaid to file
python -m app.cli.show_graph --format png -o g.png    # PNG (requires graphviz system package)

Slash commands (in-session)

Typed into the running session (TUI or REPL), dispatched before the LLM — no model call, no transcript pollution (F14):

Command Purpose
/help (/h) List commands
/agent [id] Switch agent (rebuilds the session, fresh thread)
/thread [id] Switch / resume a thread
/model [id] Switch the model (rebuilds the LLM)
/theme [name] Switch the UI theme live + persist (carbon / nord / gruvbox)
/think <high|medium|low|off> Set reasoning effort live (F15)
/clear Start a fresh thread (same agent)
/voice [on|off] Toggle push-to-talk voice I/O (needs --voice + voice_policy)
/img <path> [text] Attach an image to this turn (F02)
/mcp [server] Show MCP servers (TUI: open the control panel, Ctrl+O)
/stop (/halt) Abort the turn currently in flight (Ctrl+S in the TUI)
/save [path] Save the transcript to a JSON file
/quit (/exit, /q) Exit the session

Architecture

User input → [input_guard] → [compaction] → agent node → LLM with tools → DynamicToolNode → back to agent → ... → [output_guard] → final response

input_guard, output_guard (toggled by the agent's guardrails_policy.yaml), and compaction (toggled by compaction_policy.yaml) are conditional nodes. A blocked request returns a refusal message without reaching the LLM. The agent injects long-term user memories, the configured persona, and the running compaction summary into the system prompt before each LLM call. An empty reasoning-only reply is re-prompted once (reprompt), then replaced by a non-blank fallback (reply_fallback).

Multi-Agent System

The agent can run as one of several named agents, each with its own persona, tool policy, MCP servers, and guardrails.

  • Per-agent folders — each agent lives in app/agents/<agent_id>/ and is self-contained, with its own agent_config.yaml, tool_policy.yaml, mcp_servers.yaml, guardrails_policy.yaml, compaction_policy.yaml, and meta.yaml. The shipped agents are default (life-coach persona), researcher (factual research aide with lighter guardrails; run_shell enabled in host mode), and researcher_docker (same persona with run_shell in Docker-sandbox mode).
  • Per-agent modelmeta.yaml's model field sets the OpenRouter model for that agent. Omit it (or model: null) to use the OPENROUTER_MODEL env default. A provider: field selects the LLM adapter (default openrouter — see LLM Adapter).
  • Per-agent on/offmeta.yaml's enabled flag controls whether an agent is selectable. Toggle with manage_agents enable|disable <id>.
  • Manifestapp/agents/registry.yaml records the default agent. If the default is disabled, the first enabled agent is used.
  • Memory isolation — long-term memory is keyed per (user_id, agent_id), so each agent keeps its own facts per user.

Progressive Tool Loading

Tools are split into three tiers:

Tier Description
Core tools Always bound to the LLM every turn (defined in tool_policy.yamlcore_tools)
Meta-tools Always bound; provide the discovery surface (search_tools, load_tools, list_tools, unload_tools)
Deferred tools Registered in the ToolRegistry but NOT bound until the LLM calls load_tools([...])

Discovery loop: search_tools("<keyword>") → pick names → load_tools([...]) → call the tool. Activated tools persist for the rest of the thread (up to max_active, LRU eviction after that).

MCP Integration

External MCP servers are declared in the agent's mcp_servers.yaml. On startup, tool metadata is fetched from all enabled servers and registered into the ToolRegistry under their server name as namespace. MCP tools enter the same deferred-discovery surface as local tools — they are NOT bound to the LLM until activated via load_tools.

Supported transports: stdio, sse, streamable_http, websocket, plus OAuth (F18). A live McpManager (app/tools/mcp_manager.py) owns per-server state and a ServerStatus snapshot, holds a persistent session per server (one owner task keeps it open so tool calls reuse it — large latency win), and re-ingests tools into the shared registry on every connect so discovery stays current. connect / disconnect / reconnect are driven live from the F17 panel or /mcp. OAuthConfig picks the flow (auto / authorization_code / device_code; device-code when headless); tokens are cached and refreshed per request. Server failures are non-fatal; the rest of the registry is unaffected.

Subagents

The parent can delegate a subtask to a child subagent via spawn_subagent (or dispatch_subagents for parallel fan-out). Each child is a fresh graph on a namespaced thread with its own persona, empty history, and a restricted toolset; only a compact result is folded back into the parent.

  • Single-file definitions — subagents live in app/agents/.subagents/<id>.agent.md: YAML frontmatter (name, description, enabled, model, allowed tools:/namespaces:, scoped skills:, optional soul) plus a Markdown body that becomes the system prompt. The .subagents/ dir is spawn-only — subagents never appear in the interactive picker. Shipped: scout (research) and reflector (reflection coach).
  • Permission inheritance — a child's tool rights are the intersection of its declared surface and the parent's current rights; never wider. Namespace tokens (rag, rag:*) expand to the concrete tools available in the (possibly restricted) registry view.
  • Skill scoping — a subagent's skills: allowlist restricts its search_skills / load_skill_ref to only those skills.
  • Runaway guards — per-parent subagent_policy.yaml sets enabled, allowed_children/denylist, max_depth (default 1), max_spawns_per_thread, and parallelism limits; each child runs under a recursion_limit.

Scaffold one with manage_agents add-subagent <id> --tools … --namespaces … --skills ….

Web Fetch

An optional web_fetch(url) deferred tool (namespace webfetch) fetches an http/https page and returns cleaned, length-capped text. Gated by the agent's webfetch_policy.yaml — when enabled: false the tool is never registered (invisible to discovery). An SSRF guard resolves the host and refuses loopback / private / link-local / reserved targets, re-checking the final URL after redirects.

Filesystem Tools

Three cross-platform core tools (namespace fs) give the agent Claude-Code-style local file inspection: read_file, grep, and glob_files. They are pure Python stdlib (pathlib / re / os / fnmatch) — no new dependency, no shelling out — so behavior is identical on Windows and POSIX.

  • read_file(path, offset?, limit?) — read a UTF-8 text file as numbered lines; byte-capped (max_read_bytes) with an offset/limit line window (max_read_lines); binary files are detected and skipped.
  • grep(pattern, path, glob?, output_mode?, case_insensitive?) — regex search over file contents under a path; glob filters filenames; output_mode is content / files / count; prunes ignored dirs (.git, node_modules, …) and skips binaries; capped at max_grep_matches.
  • glob_files(pattern, path?) — find files by glob (incl. **), returned repo-relative and mtime-sorted (newest first), capped at max_glob_results.

Every path is resolved against the project root and refused if it escapes (FsConfig.confine, default on). On by default via the agent's fs_policy.yaml (missing file → defaults); added to each shipped agent's core_tools and inherited by subagents through the rights intersection (like web_fetch). No F06 gate — only shell:* is dangerous-default-deny.

Mutating tools (write_file / edit_file) — deferred counterparts registered under a separate fs_write namespace so F06 can ask on writes while reads stay allow; byte-capped at max_write_bytes (512KB). Gated by FsConfig.allow_write (default on) — set false for a read-only agent.

Shell Execution

An optional run_shell(command, cwd?) deferred tool (namespace shell) runs real shell commands — on the host by default, or inside a Docker sandbox via a config flip. The trust model is Claude-Code-like: a human approves what runs.

  • Default off & default-deny — gated by the agent's shell_policy.yaml (enabled: false → never registered). Even when enabled, shell:* is dangerous-default-deny in the permission system (F06); an agent must explicitly allow-list shell:run_shell in its permissions.yaml. Recommended recipe = allow + ask (human approval on every call via HITL). chat_once auto-denies an ask interrupt (fail-closed), so single-shot runs never hang.
  • ShellGate backstop — a non-removable hardened denylist (rm -rf, sudo, dd, mkfs, fork bombs, …) plus optional per-agent regex allow/deny and opt-in confine cwd containment.
  • Two executorshost (create_subprocess_shell, real shell with pipelines) and sandbox (create_subprocess_exec with a pluggable sandbox_launcher template — Docker/WSL/nsjail). Both behind the same gate, audit, and tool body.
  • Bash parityShellConfig.interpreter defaults to "bash": the command runs as one argv element via create_subprocess_exec (no shell-launcher involved), so bash tokenizes pipes/quoting/heredocs identically on every OS. On Windows this probes Git Bash then falls back to wsl bash — the previous approach (pinning shell_executable through create_subprocess_shell) was unreliable there, since Windows always runs {executable or COMSPEC} /c {command} regardless of the executable. interpreter: "host" restores the old cmd.exe//bin/sh behavior. An optional persistent per-thread bash -l session (session: "persistent") keeps cd/env vars/venv activation across calls.
  • Always-on safetytimeout_s with process-tree kill, output capped twice (at the source and at ingestion), and env scrubbing so secrets (OPENROUTER_API_KEY, DATABASE_URL, LANGFUSE_*) never reach the child.
  • Loop guardspre_tool hooks (app/hooks/guards.py) flag repeated near-identical run_shell commands (shell_loop_guard) and repeated identical failure signatures across different commands (verification_loop_guard — pytest/mypy/tsc/panic tracebacks recurring while the fix isn't landing).
  • Auditpre_tool / post_tool hooks log every invocation (intent + outcome, including blocked/timeout) to data/audit/shell-audit.jsonl (dev) or a tool_audit table (prod). See docs/audit-log.md.

Two agents ship the opt-in: researcher (host mode; its persona bakes a map-reduce flow for summarizing large PDFs without overflowing context) and researcher_docker (mode: sandbox, launcher docker run --rm --network none -v {workdir}:/work -w /work {image}, image ziro-pdf:latest — build with docker compose build pdf-sandbox).

Local Voice

Optional push-to-talk speech I/O (F11): the turn is bracketed by STT (your speech → the user message) and TTS (the post-output-guard reply → audio). Default backends are fully local — faster-whisper for STT, Piper for TTS — so no cloud API key is needed (cloud STT/TTS over httpx are also pluggable per agent). Gated by the agent's voice_policy.yaml (enabled: false by default).

Setup is two commands:

uv sync --group voice-local   # faster-whisper STT + Piper TTS (+ sounddevice/PortAudio)
python -m app.cli.startup     # downloads the default Piper voice into data/voices/ (gitignored)

Then enable it in the agent's app/agents/<id>/voice_policy.yaml:

enabled: true
tts_backend: local
tts_model: data/voices/en_US-amy-medium.onnx   # path to the downloaded .onnx (config .onnx.json resolved alongside)

Run with python -m app.main --user <id> --voice, or toggle /voice on in the TUI (Ctrl+R to start/stop recording). Piper voices come from the Rhasspy catalog (<lang>-<name>-<quality>); download others with python -m piper.download_voices <voice-id> --download-dir data/voices and point tts_model at the new .onnx. The researcher agent ships voice-enabled.

Tool Permissions (F06)

Every tool call passes a per-agent permission gate (app/permissions/) wired as a pre_tool hook. The agent's permissions.yaml declares allow / deny / ask globs over tool/namespace qualified names plus a default_action; dangerous_default_deny (ships ["shell:*"]) forces a default-deny for dangerous namespaces. An ask decision raises a human-in-the-loop interrupt (the TUI ApprovalModal, REPL prompt, or auto-deny in chat_once); approvals can be remembered once, for the thread, or always (remember_default_scope, persisted per user/agent). Tool args are redacted (secrets masked, long values truncated) before display.

Hooks (F01)

app/hooks/ is a declarative lifecycle-interception layer. An agent's hooks.yaml lists HookSpec entries — each binds a HookEvent (session_start, pre_turn, post_turn, pre_tool, post_tool, pre_model, post_model), an fnmatch matcher (over tool qualified names for tool events), and a callable (Python dotted path or shell command). Hooks return a HookDecision (allow / deny / modify / interrupt); the runner short-circuits on deny/interrupt and folds modify. Hooks are ordered once at startup, so a no-hook registry has zero per-turn cost. This path powers the F06 permission gate and the F10 shell audit (*run_shell pre/post_tool).

Human Handoff (F07)

An optional request_handoff(reason) core tool (app/handoff/) lets the agent escalate to a human. It pauses the turn, surfaces the reason to an operator, accepts a human-authored reply, and resumes with that reply injected as the next message (assistant or tool, per config). The agent's handoff_policy.yaml (HandoffConfig) sets enabled, the operator prompt, a timeout with an on_timeout fallback, and inject_as.

Clarifying Questions

A core ask_user_question(questions) tool (app/clarify/) mirrors Claude Code's AskUserQuestion: pauses mid-turn to ask 1-4 structured clarifying questions (2-4 options each, optional multi-select, implicit free-text "Other"), then folds the human's answers back into the tool result. It shares the same HITL interrupt module as F06/F07 (app/graph/interrupts.pyInterruptKind.ASK_USER_QUESTION) rather than a bespoke pause mechanism; the TUI renders a QuestionModal, the REPL prints numbered options. Gated by clarify_policy.yaml (ClarifyConfigenabled, max_questions, max_options; missing file → on by default).

Task / Todo List (F08)

app/tasks/ adds state-based todo tracking via two core tools: write_todos(todos) (full rewrite) and update_todo(id, status|content|result). Todos live in AgentState.todos (merged by a per-id reducer), survive checkpointing, and render live in the TUI side pane. A todo can carry an agent_id to mark subagent delegation.

Multimodal Input (F02)

Images attach via the /img <path> directive or the CLI --attach flag (app/io/attachments.py). build_human_message checks supports_multimodal(model): vision-capable models receive base64 data-URI image blocks in a multimodal HumanMessage; text-only models get a graceful notice instead. Bounded by the agent's attachment_policy.yaml (enabled, max_image_bytes, allowed_image_types).

Interactive TUI & Themes (F12 / F16 / F17)

app/tui/ is a Textual front-end over the shared TurnRunner: a transcript, a todo/active-tools side pane, a status footer, and an ApprovalModal rendering the shared HITL interrupt. It paints in ~1.6s and is usable immediately while the engine builds on a background thread — any turn typed early is queued and runs in order (F26). Extras:

  • Themesapp/tui/theme.py maps semantic color slots (not literal hex) to Textual + Rich widgets. Carbon (amber + steel on carbon black) is the default; nord and gruvbox also ship. Palettes are discovered from three roots (bundled app/tui/themes/, project ./themes/, home ~/.ziro/themes/); /theme [name] swaps live and persists to data/ui_prefs.json.
  • Context meter (F16) — a ContextMeter widget reveals at ≥50% of the usable input budget and shades muted → amber → red as the request approaches the compaction trigger, with a trip-line marker at the trigger line.
  • MCP control panel (F17)Ctrl+O (or /mcp in the TUI) opens a modal McpPanel: a live table of servers (state, transport, auth, tool count) with Connect / Disconnect / Reconnect, tool peek, and an OAuth device-code prompt.
  • Reactive state + UI peek — session state (identity/lifecycle/content/chrome) lives in one reactive UIStore (app/tui/store.py) mutated only through typed messages (app/tui/messages.py); widgets bind via watch() instead of imperative update_*() calls. python -m app.tui.demo (or textual serve "python -m app.tui.demo" for a browser demo) drives a scripted fake runner for visual review; SVG snapshot regression tests live in tests/test_tui_snapshots.py (pytest --snapshot-update to refresh baselines). See docs/tui-peek.md.

LLM Adapter

app/llm/ wraps LangChain's Runnable behind an LLMAdapter ABC (invoke/ainvoke/stream/astream, bind_tools(), context_window()/supports()), decoupling the graph from OpenRouter specifically. app/llm/factory.py resolves a provider (meta.yaml's provider: field, default openrouter) to an adapter — only OpenrouterAdapter ships today, but unknown providers fail fast rather than silently falling back, and get_llm()/get_agent_llm()/get_summary_llm() in app/core/config.py are now thin wrappers over build_adapter() rather than constructing ChatOpenAI directly. This is step one toward multi-provider support.

Context Compaction

Each agent has a compaction_policy.yaml. When a turn's request exceeds trigger_pct of the model's usable input budget (window minus reserved output and schema headroom), the compaction node folds older messages into a running summary (injected into the system prompt) and drops them from live history via RemoveMessage — shrinking both the in-flight request and the persisted checkpoint. The keep_recent_min most-recent messages are always kept verbatim, and the split is chosen to never orphan a ToolMessage from its parent call.

Budget math is model-aware: app/llm/openrouter_catalog.py (formerly app/core/model_specs.py) reads the OpenRouter /models catalog once per process for each model's context_length, max_completion_tokens, and supported_parameters, so the trigger, the reserved-output carve-out, and the max_tokens sent to the LLM stay consistent. Strategy is hybrid (summarize the dropped span via a clean tool-free LLM) or trim (drop only). Setting enabled: false restores un-compacted behavior exactly. A separate max_tool_message_tokens / max_tool_message_pct cap bounds a single oversized ToolMessage at ingestion. See docs/chat-compression.md.

Tools available to the LLM

Tool Tier Purpose
search_rag(query) Core Semantic search over indexed documents
search_skills(query) Core Semantic search over SKILL.md files
load_skill_ref(skill_name, filename) Core Load a reference/script file from a skill directory on demand
save_memory(content) Core Persist a user fact to long-term store
search_tools(query) Meta Find available deferred tools by keyword
load_tools(names) Meta Activate deferred tools for this thread
list_tools(namespace) Meta Browse all tools by namespace
unload_tools(names) Meta Deactivate tools to free context
write_todos(todos) Core Rewrite the turn's todo list (F08); renders in the TUI side pane
update_todo(id, …) Core Update one todo's status/content/result (F08)
request_handoff(reason) Core Pause the turn for a human operator and inject their reply (F07); gated by handoff_policy.yaml
ask_user_question(questions) Core Pause the turn to ask 1-4 structured clarifying questions (2-4 options, optional multi-select); gated by clarify_policy.yaml
spawn_subagent(agent_id, task) Deferred Delegate a self-contained subtask to a child subagent in isolated context; returns one concise result
dispatch_subagents(tasks) Deferred Parallel fan-out (gated by enable_parallel); one result block per child
get_subagent_transcript(run_id) Deferred Pull a past child run's full transcript on demand (capped)
read_file(path, offset?, limit?) Core Read a project text file as numbered lines; byte/line capped, binary-safe, path-confined (fs)
grep(pattern, path, glob?, output_mode?, case_insensitive?) Core Regex search over file contents (content/files/count); skips ignored dirs + binaries (fs)
glob_files(pattern, path?) Core Find files by glob, mtime-sorted, repo-relative, path-confined (fs)
write_file(path, content) / edit_file(path, …) Deferred Mutating file writes, byte-capped, path-confined; separate fs_write namespace so writes can ask while reads stay allow; gated by fs_policy.yaml allow_write
web_fetch(url) Deferred Fetch an http/https page, return cleaned text; SSRF-guarded; gated by webfetch_policy.yaml
run_shell(command, cwd?) Deferred Run a shell command on the host or a sandbox container; default-deny + HITL, denylist/timeout/output-cap/env-scrub; gated by shell_policy.yaml
deferred tools Deferred Any tool registered in the registry (local or MCP)

Runtime Modes

Mode Trigger Store Checkpointer Vector
Dev No DATABASE_URL SqliteStore (./data/memories.db) AsyncSqliteSaver (./data/checkpoints.db) FAISS
Prod DATABASE_URL set PostgresStore AsyncPostgresSaver pgvector

Key Modules

Module Role
app/graph/graph.py Builds LangGraph state machine; wires nodes + routing (including guardrail nodes)
app/graph/nodes.py make_agent_node(), make_dynamic_tool_node(), make_memory_tools()
app/graph/state.py AgentState TypedDict — messages, guardrail_*, active_tools (+ReplaceActiveTools for LRU), running_summary, last_compaction_index; extract_text() for reasoning-block content
app/core/config.py Reads .env; LLM factories get_llm() / get_agent_llm() / get_summary_llm() — thin wrappers over app.llm.factory.build_adapter() (model-aware max_tokens), embeddings, Langfuse handler; load_* for agent/tool/mcp/compaction configs
app/core/agent_profiles.py AgentProfile — resolves per-agent config files + model; select_profile(), list_agent_profiles(), all_agent_profiles(), get_agent_profile(); subagent resolution (SUBAGENTS_DIR, *_subagent_profile(s))
app/core/agent_md.py Parses single-file *.agent.md subagent definitions (frontmatter + Markdown body)
app/llm/openrouter_catalog.py OpenRouter /models catalog cache (renamed from app/core/model_specs.py): context_length, max_completion_tokens, supported_parameters, provider; supports_parameter() gating, list_model_ids(provider=)/list_providers()
app/llm/adapter.py LLMAdapter ABC wrapping a LangChain Runnableinvoke/ainvoke/stream/astream, bind_tools(), build() classmethod
app/llm/factory.py build_adapter(provider, ...) — provider registry (only openrouter shipped; unknown provider errors)
app/core/paths.py PROJECT_ROOT + resolved dirs (AGENTS_DIR, DATA_DIR, SKILLS_DIR) anchored to the repo root
app/compaction/ node.py (make_compaction_node, pick_split), window.py (budget/trigger math), summarizer.py, tokenizer.py, models.py (CompactionConfig / CompactionResult)
app/memory/store.py save_memory / load_memories — user-scoped long-term facts
app/memory/checkpointer.py Thread-level checkpoints (enables --thread resumption)
app/rag/retriever.py search_rag, search_skills, load_skill_ref tools; lazy FAISS/PGVector loading with module-level cache
app/rag/indexer.py Chunking (1000 tokens, 200 overlap), embedding (HuggingFace all-MiniLM-L6-v2), indexing
app/tools/registry.py ToolRegistry — unified source of truth for all tools; semantic + keyword search; ToolDescriptor; expand() (namespace tokens → tools); view() restricted facade
app/tools/meta_tools.py search_tools, load_tools, list_tools, unload_tools — the progressive-loading discovery surface
app/tools/bootstrap.py build_local_registry() / ingest_mcp_tools() — startup wiring of local + MCP tools
app/tools/mcp_client.py Connects to MCP servers from agent mcp_servers.yaml; fetches + allowlist-filters tools
app/tools/indexer.py CLI to re-index tool descriptions into the tools vector-store collection
app/guardrails/models.py Pydantic models: PolicyConfig, PolicyRule, GuardrailDecision, backend configs
app/guardrails/backends.py RegexInjectionRunnable, LocalClassifierRunnable, PresidioRunnable, LlamaGuardRunnable, LLMGuardrailRunnable; make_backend() factory
app/guardrails/evaluator.py GuardrailEvaluator — groups rules by backend, runs backends concurrently
app/guardrails/nodes.py make_input_guard_node() / make_output_guard_node() LangGraph node factories
app/guardrails/policy_loader.py load_policies() — reads an agent's guardrails_policy.yaml
app/cli/chat_once.py Single-shot invocation; outputs one JSON line (for programmatic / LLM-driven use)
app/cli/run_scenarios.py Replay scenario JSON files through the agent; writes side-by-side transcripts
app/cli/manage_agents.py Scaffold, list, enable/disable, and configure agents
app/cli/show_graph.py Visualize the LangGraph state machine (ASCII, Mermaid, PNG)
app/subagents/ models.py (SpawnPolicy etc.), orchestrator.py (SpawnContext, graph cache, rights intersection, skill-scope threading), tool.py (make_subagent_tools)
app/permissions/ F06: models.py (PermissionPolicy/PermissionDecision/PermissionRequest), gate.py/policy.py (evaluate + arg redaction), hook.py (pre_tool wiring), store.py (durable grants)
app/hooks/ F01: models.py (HookEvent/HookSpec/HookContext/HookDecision), registry.py, runner.py, callables.py (Python/shell callables), guards.py (shell_loop_guard/verification_loop_guard — repeated-command / repeated-failure loop detection)
app/graph/interrupts.py Shared HITL interrupt module for F06/F07/clarify: InterruptKind, raise_interrupt(), InterruptRequest/InterruptResponse, render_interactive()
app/handoff/ F07: models.py (HandoffConfig), tools.py (request_handoff); gated by handoff_policy.yaml
app/clarify/ ask_user_question core tool: models.py (ClarifyConfig), tools.py (make_clarify_tools); gated by clarify_policy.yaml
app/tasks/ F08: models.py (Todo/TodoStatus), tools.py (write_todos/update_todo), reducer.py, render.py
app/io/attachments.py F02: AttachmentConfig, parse_attachments, to_image_block, build_human_message (multimodal vs text-only)
app/tui/theme.py + themes/ Theme palettes (Palette, carbon/nord/gruvbox, discover_palettes, live set_active); app/core/ui_prefs.py persists the choice
app/tui/mcp_panel.py F17: McpPanel modal + OAuthPromptModal (device-code prompt)
app/tui/store.py UIStore — single non-visual reactive state holder (identity/lifecycle/content/chrome) mounted on the App
app/tui/messages.py SessionSwitched/BusyChanged/StartupTicked/StateChanged/UsageChanged/ThemeChanged — the only vocabulary that mutates UIStore
app/tui/demo.py ScriptedRunner (torch-free fake TurnRunner) + build_demo_app() — UI peek via python -m app.tui.demo / textual serve
app/tools/mcp_manager.py + mcp_models.py F18: McpManager (connect/disconnect/reconnect, persistent sessions, re-ingest), ServerStatus/ToolSpec/OAuthConfig
app/webfetch/ models.py (WebFetchConfig), tool.py (web_fetch, SSRF guard, HTML→text); gated by webfetch_policy.yaml
app/fs/ models.py (FsConfig — read caps + allow_write/max_write_bytes), tools.py (make_fs_toolsread_file/grep/glob_files; make_fs_write_toolswrite_file/edit_file under the separate fs_write namespace; _resolve path confinement); pure stdlib, cross-platform; gated by fs_policy.yaml (on by default)
app/tools/shell.py + shell_models.py F10/F27 shell: ShellGate, host + sandbox executors, make_shell_tool, output caps, process-tree kill; _resolve_bash() (Git Bash/WSL probing) + PersistentBashSession; ShellConfig/GateDecision
app/tools/shell_audit.py F10 audit hooks (log_intent/log_outcome) → data/audit/shell-audit.jsonl / tool_audit table
docker/pdf-sandbox/Dockerfile F10 sandbox image (python:3.12-slim + pypdf + reportlab); built via the pdf-sandbox compose service
skills/loader.py Walks ./skills/ for SKILL.md files; indexes them as the "skills" collection
app/agents/<id>/agent_config.yaml Per-agent system_prompt, soul_prompt, fallback_messages
app/agents/<id>/tool_policy.yaml Per-agent core_tools, search_k, max_active, denylist
app/agents/<id>/mcp_servers.yaml Per-agent MCP server declarations (transport, command/url, allowlist, enabled flag)
app/agents/<id>/guardrails_policy.yaml Per-agent named backends pool + input/output rules
app/agents/<id>/compaction_policy.yaml Per-agent compaction policy (enabled, trigger/target band, retention, summarizer)
app/agents/<id>/subagent_policy.yaml Per-parent spawn policy (enabled, allowed_children/denylist, max_depth, spawn/parallel caps)
app/agents/<id>/webfetch_policy.yaml Per-agent web-fetch policy (enabled master flag, SSRF/scheme/host controls, content cap)
app/agents/<id>/fs_policy.yaml Per-agent filesystem-tools policy (enabled, confine, read byte/line caps, grep/glob result caps, ignore_dirs)
app/agents/<id>/shell_policy.yaml Per-agent shell policy (enabled, mode host/sandbox, denylist/allowlist, timeout, output cap, env passthrough, sandbox launcher/image)
app/agents/<id>/permissions.yaml Per-agent F06 tool-permission policy (allow/deny/ask globs; shell:* default-deny; remember scope)
app/agents/<id>/hooks.yaml Per-agent F01 lifecycle hooks (events, glob matcher, Python/shell callable; e.g. *run_shell audit pre/post_tool)
app/agents/<id>/handoff_policy.yaml Per-agent F07 human-handoff policy (enabled, operator prompt, timeout/fallback, inject_as)
app/agents/<id>/attachment_policy.yaml Per-agent F02 multimodal policy (enabled, max_image_bytes, allowed_image_types)
app/agents/<id>/memory_policy.yaml Per-agent F03 self-improving-memory policy (fact extraction / reflection)
app/agents/<id>/voice_policy.yaml Per-agent F11 voice policy (enabled, STT/TTS backend + model)
app/agents/<id>/queue_policy.yaml Per-agent F13 queue policy (in-flight always on; background worker default off)
app/agents/<id>/meta.yaml Agent name, description, enabled, model
app/agents/.subagents/<id>.agent.md Single-file subagent definition (frontmatter persona/tools/skills + Markdown system prompt)
app/agents/registry.yaml Records the default agent id

Guardrails

Policies are declared in the agent's guardrails_policy.yaml. Each rule references a named backend from the backends: pool. Multiple backends run concurrently; first block in declaration order wins. Rules support refusal_templates lists for randomized refusals.

Backend type Description
regex_injection Deterministic OWASP prompt-injection patterns + evasion detection (typoglycemia, Base64/hex, char-spacing); zero dependencies
local_classifier HuggingFace text-classification (prompt injection detection)
presidio Microsoft Presidio PII detection (bundled in core deps; spaCy model via python -m app.cli.startup)
llama_guard Llama Guard 3 1B GGUF content safety; supports block_categories to scope which S-codes block (bundled in core deps; GGUF via python -m app.cli.startup)
openai Any OpenAI-compatible API for LLM-based policy evaluation

Extending

Add documents:

python -m app.rag.indexer ./my-docs/

Add a skill:

mkdir -p skills/my-skill/references
# create skills/my-skill/SKILL.md
# (optional) add reference files to skills/my-skill/references/
python -m skills.loader

Add an agent:

python -m app.cli.manage_agents add my-agent --name "My Agent"
# Edit app/agents/my-agent/agent_config.yaml, tool_policy.yaml, etc.

Add a subagent:

python -m app.cli.manage_agents add-subagent my-scout --tools rag:search_rag --namespaces rag --skills deep-research
# Edit app/agents/.subagents/my-scout.agent.md (frontmatter tools/namespaces/skills + persona body)

Add an MCP server: add an entry to the agent's app/agents/<id>/mcp_servers.yaml, restart the agent.

Add a guardrail rule: edit the agent's app/agents/<id>/guardrails_policy.yaml — add a backend entry and a rule referencing it. No code changes needed.

Change default model:

OPENROUTER_MODEL=anthropic/claude-sonnet-4-5

Change model for a specific agent:

python -m app.cli.manage_agents set-model <AGENT_ID> anthropic/claude-sonnet-4-5

Enable reasoning:

OPENROUTER_REASONING_EFFORT=medium

Configure agent persona: edit app/agents/<id>/agent_config.yaml (system_prompt, soul_prompt, fallback_messages).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ziro-0.1.1.tar.gz (1.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ziro-0.1.1-py3-none-any.whl (962.2 kB view details)

Uploaded Python 3

File details

Details for the file ziro-0.1.1.tar.gz.

File metadata

  • Download URL: ziro-0.1.1.tar.gz
  • Upload date:
  • Size: 1.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ziro-0.1.1.tar.gz
Algorithm Hash digest
SHA256 e0348e73dae7f2f7317a36c85c32a79dcb75c7a44f3881c6fafeedaed7defec3
MD5 926c4522aa3cf37ff3aa67d4d9d661a3
BLAKE2b-256 6452517d9358da88796fa5f77aa1d7a97e3e196cefaaaf384deb25fbd54566e4

See more details on using hashes here.

Provenance

The following attestation bundles were made for ziro-0.1.1.tar.gz:

Publisher: release.yml on hRupanjan/ziro

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ziro-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: ziro-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 962.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ziro-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 01e0e93ff7d89eef15e4055c3fe9fbcf88d621cea8a0481876f684f77b9d1a38
MD5 659df91f34b181dc5527eb08408269d3
BLAKE2b-256 2909ed0b9485b9f7d8695ab3b14f6497cac7288f8b0c2e6153ca5664f2107076

See more details on using hashes here.

Provenance

The following attestation bundles were made for ziro-0.1.1-py3-none-any.whl:

Publisher: release.yml on hRupanjan/ziro

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page