A local-first cognitive agent with a learned router (Caudate) and Claude-SDK-shaped tool palette.

These details have not been verified by PyPI

Project links

Project description

Caudate

A local-first cognitive agent with Claude-SDK feature parity, built on Ollama via LiteLLM. Caudate runs entirely on your hardware by default — no API keys, no network calls — but switches to Anthropic, OpenAI, or any LiteLLM-supported provider with a one-line config change.

It's not a thin chat wrapper. The architecture explicitly separates:

Memory — episodic, semantic, procedural, working
Planning — DAG-based goal decomposition with replanning
Reflection — meta-learning from past goal outcomes
Personality — identity, mood, inner voice
Dual-process routing — fast/slow models picked per call (System 1 / System 2)

…with a Claude-Code-style agentic loop on top: real-time tool calls, streaming, sessions, hooks, MCP, subagents, permissions, and a fully-featured CLI + HTTP API.

Status: feature-complete against its original five-phase roadmap plus Claude SDK extras and Claude Code UX parity. See NEXT_ACTIONS.md for what's done and what's deferred.

Quickstart

Install from PyPI

pipx install caudate-cli

One install gets you everything: dual-brain routing, the Caudate NN router, voice (Moonshine STT + Kokoro/Piper TTS), image generation (diffusers + FLUX/SDXL), PDF extraction, native Anthropic SDK, and the full tool palette. Heavy — pulls torch, transformers, diffusers — but you don't have to think about extras.

Vision works out of the box: DescribeImage routes through whichever vision-capable model you pick (qwen3-vl, glm-5v, claude-haiku-4-5, GPT-4V, …) so the dependency is your LLM choice, not a separate install.

On first launch, caudate runs a one-time setup wizard that picks your fast/slow models, downloads Caudate's weights from HuggingFace, and writes ~/.caudate/settings.json. After that:

caudate               # banner + REPL
caudate doctor        # diagnose what's wired (Ollama, Caudate, API keys)
caudate init --force  # re-run the wizard if you change your mind

Requirements:

Python ≥ 3.10
Ollama running locally if you want the local-only or hybrid preset (skip if you go hosted-only)
An ANTHROPIC_API_KEY in your shell only if you pick a preset that uses an anthropic/... model

Install from source (for development)

git clone https://github.com/raveuk/caudate-cli.git
cd caudate-cli
python3 -m venv .venv && source .venv/bin/activate
pip install -e .
caudate init

Talk to it

caudate                                       # default — drops into REPL
caudate interactive --model fast              # preset model
caudate interactive \
    --system1 ollama/qwen2.5-coder:1.5b \
    --system2 ollama/gemma3:27b              # explicit dual-brain

A REPL opens. Type to chat. Type /help for slash commands.

4. Or hit it over HTTP

caudate serve --port 8000
# in another terminal:
curl -X POST http://127.0.0.1:8000/chat \
    -H 'content-type: application/json' \
    -d '{"message":"what is in this directory?"}'

The HTTP server also hosts a Web UI at http://127.0.0.1:8000/ui.

Use Caudate as the backend for Open WebUI

Open WebUI is a polished chat UI that talks to any OpenAI-compatible endpoint. Caudate exposes both Anthropic-shape (/v1/messages) and OpenAI-shape (/v1/chat/completions) endpoints, so it slots in cleanly:

# 1. Start Caudate's API server
caudate serve --port 8000

# 2. In Open WebUI's settings → Connections → OpenAI API
#    Base URL: http://localhost:8000/v1
#    API Key:  any non-empty string (Caudate ignores it)

# 3. Pick "claude-haiku-4-5", "claude-opus-4-7[1m]", or whichever
#    model id is wired in your ~/.caudate/settings.json

Anything you type in Open WebUI now goes through Caudate's dual-brain routing, Caudate the NN router, and the full tool palette. Vision works the same way — drop an image into Open WebUI's chat and Caudate routes it to whichever vision-capable model you've configured.

For voice (caudate talk), image generation (caudate draw), and the Forge autonomous-coding harness, run those commands directly — Open WebUI doesn't surface them.

What you get out of the box

CLI

Command	What it does
`caudate interactive`	REPL with streaming, slash commands, history, multi-line input
`caudate run <goal>`	Single-shot DAG planner — decomposes a goal, runs it, reflects
`caudate talk`	Voice mode (Moonshine STT + Kokoro TTS, Whisper/Piper fallback)
`caudate draw "<prompt>"`	Generate an image (diffusers / FLUX.1-schnell or SDXL-Turbo)
`caudate caudate {train,eval,status,export}`	Train/inspect Caudate, the learned router/advisor NN
`caudate serve [--port 8000]`	FastAPI HTTP server with SSE streaming
`caudate sessions {list,delete,rename,export}`	Manage saved conversations
`caudate personality {show,set,reset}`	Inspect or tune identity / mood
`caudate models`	List detected Ollama models with capability flags
`caudate router`	Preview routing decisions without calling the LLM
`caudate bench`	Run the benchmark suite
`caudate cron {add,list,remove,run}`	Schedule recurring prompts
`caudate mcp-serve`	Run Caudate as an MCP server
`caudate update`	Self-update (git pull or pip upgrade)
`caudate info`	List registered tools and learned strategies

Slash commands (inside the REPL)

/help, /clear, /compact, /model <id|fast|balanced|powerful>, /cost, /tools, /sessions, /export <md|json|html>, /files, /permissions <mode>, /personality, /router, /diff <path>, /status, /cron, /bg, /notify, /think on|off, /save, /quit. Type /help for the full list with descriptions.

Tools the agent can call

~38 built-in tools, including: Bash, Read, Write, Edit, Glob, Grep, WebSearch, WebFetch, PythonExec, Think, Respond, Agent (subagents), Draw, EditImage, DescribeImage, Speak, TranscribeAudio, Storyboard, Sandbox, Calculator, DateTime, HttpRequest, OpenAPI, Notebook, Cron, PushNotification, AskUserQuestion, LoadSkill, UpdateMemory, MCP, Worktree, PlanMode, FindAnywhere, SemanticSearch, SystemInfo, Task, CognosCard, Artifact, Agentic. Drop a plugins/*.py exposing PLUGIN = ToolInstance to add your own.

Caudate — the learned brain

Caudate ships with Caudate, a small PyTorch transformer that learns your tool-use patterns turn-by-turn. It observes every conversation, auto-trains in the background once it has enough samples, and graduates through trust levels (SILENT → OBSERVER → WHISPER → ADVISOR → CONTROLLER) based on rolling accuracy. At WHISPER it whispers a hint into the LLM prompt; at ADVISOR it can override tier routing.

See CAUDATE.md for the full architecture, nn/ for the code, data/nn/ for the live checkpoint and replay buffer.

Multi-modal in / out

@file references — look at @config.py inlines or attaches the file.
Drag-and-drop images / PDFs — paths in the prompt are auto-uploaded via the Files API.
POST /files — same Files API exposed over HTTP.
Citations — pass documents=[{id,title,text}] and the model can emit [[cite:doc:Lx]] markers, post-processed into structured CitationBlock objects.

Architecture

                ┌──────────────────────────────────────────────────┐
                │                 CognosAgent                       │
                │                                                   │
   user input  ─┼─►  AgenticLoop  ◄──►  Executor  ──►  tools/      │
                │       │                  ▲                        │
                │       │                  │                        │
                │       ▼                  │                        │
                │   Personality ─► hooks ──┘                        │
                │       │                                           │
                │       ▼                                           │
                │     LLM Router (DualLLMProvider)                  │
                │     ├── System 1: fast model                      │
                │     └── System 2: slow model                      │
                │                                                   │
                │   Memory: episodic | semantic | procedural | working
                │   Session persistence + context compaction        │
                │   Permissions (modes + allow/deny rules + audit)  │
                │   MCP clients (cognos_mcp/)                       │
                │   Subagents (workspace-isolated via git worktrees)│
                └──────────────────────────────────────────────────┘

Each subsystem is documented in BUILD_LOG.md. The Claude SDK Extras and Claude Code UX Parity sections in NEXT_ACTIONS.md enumerate what's wired and where.

Configuration

Three layers, last wins:

Built-in defaults in core/settings.py
~/.caudate/settings.json — per-user
./.caudate/settings.json — per-project

Example:

{
  "model": "ollama/gemma3:27b",
  "permission_mode": "default",
  "fallback_models": ["ollama/qwen2.5-coder:1.5b"],
  "permissions": {
    "allow": [{"tool": "Bash", "pattern": "^(ls|cat|grep)"}],
    "deny":  [{"tool": "Bash", "pattern": "rm -rf"}]
  },
  "statusline": "{model} | {mood} | tok={tokens} | ${cost:.4f}",
  "notifications": {"enabled": true, "on_long_task_seconds": 30}
}

CLI flags always override settings (--model fast, --permissions plan).

Web UI

A zero-build single-page UI ships with the HTTP server:

caudate serve --port 8000
# open http://127.0.0.1:8000/ui

It speaks to POST /chat/stream (SSE), supports session resume, file attachments, and slash-style commands. Source: ui/web/.

IDE plugins

ide/vscode/ — TypeScript extension. Sidebar webview, "Ask about selection" right-click, configurable API URL / model / permission mode.
ide/jetbrains/ — Kotlin plugin for IntelliJ-platform IDEs (IDEA, PyCharm, GoLand, WebStorm, RustRover, …). Tool window, editor action, settings page.

Both are thin clients — they make HTTP calls to a running caudate serve process, no LLM runs in the IDE.

Optional extras

pip install-flagged features that are no-ops without their dep:

Extra	Unlocks
`anthropic`	Real prompt caching, native extended thinking, native `response_format` for `claude-*` model ids
`pypdf`	PDF text extraction in the Files API
`prompt_toolkit`	Multi-line input + persistent history + Ctrl+R + slash completion
`fastapi` + `uvicorn`	The HTTP server (`caudate serve`)
`mcp`	The MCP server / client (`caudate mcp-serve`)
`useful-moonshine-onnx` + `kokoro` + `piper-tts` + `sounddevice`	Voice mode (`caudate talk`)
`diffusers` + `transformers` + `torch`	Image generation (`caudate draw`)
`torch` + `sentence-transformers`	Caudate (the learned router NN)

Caudate runs without any of them — they degrade gracefully.

Project layout

core/             agent, agentic loop, sessions, hooks, permissions, files,
                  citations, settings, slash commands, …
execution/        tool registry + 12 built-in tools + plugin loader
llm/              LiteLLM provider, model registry, dual-process router,
                  fallback chains
memory/           episodic / semantic / procedural / working
planning/         DAG planner, task graph
reflection/       reflector, meta-learner
personality/      identity, mood, inner voice
cognos_mcp/       MCP server, client, bridge
api/              FastAPI HTTP server
bench/            benchmark suite
plugins/          drop-in tools (`PLUGIN = ToolInstance`)
ide/vscode/       VS Code extension
ide/jetbrains/    JetBrains plugin
ui/               terminal display + web UI
data/             local state — sessions, files, manifests, audit log

Why local-first?

Three reasons:

Privacy. Code, conversations, and learned strategies live on disk.
Cost. A small Ollama model runs at $0/turn and answers in milliseconds for routine work.
Sovereignty. No vendor outage takes you offline; no rate limit slows you down.

The dual-process router exists so you can keep most turns on a small local model and only escalate hard turns to a heavy one (which can itself be local — or Anthropic/OpenAI when you're online).

License

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.21

May 19, 2026

This version

0.1.20

May 19, 2026

0.1.17

May 19, 2026

0.1.4

May 19, 2026

0.1.3

May 19, 2026

0.1.1

May 19, 2026

0.1.0

May 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

caudate_cli-0.1.20-py3-none-any.whl (490.0 kB view details)

Uploaded May 19, 2026 Python 3

File details

Details for the file caudate_cli-0.1.20-py3-none-any.whl.

File metadata

Download URL: caudate_cli-0.1.20-py3-none-any.whl
Upload date: May 19, 2026
Size: 490.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for caudate_cli-0.1.20-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3b939c30d254668507783b5f3031d34153ded77b3bd3eed8d2606c7803c8144a`
MD5	`e8107e746ffe698c013bfcc243eba4bb`
BLAKE2b-256	`099b0e80d79ebc0490466659eb3ac3940975aac7060174bf82bba28102279308`

See more details on using hashes here.

caudate-cli 0.1.20

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Caudate

Quickstart

Install from PyPI

Install from source (for development)

Talk to it

4. Or hit it over HTTP

Use Caudate as the backend for Open WebUI

What you get out of the box

CLI

Slash commands (inside the REPL)

Tools the agent can call

Caudate — the learned brain

Multi-modal in / out

Architecture

Configuration

Web UI

IDE plugins

Optional extras

Project layout

Why local-first?

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes