A local-first cognitive agent with a learned router (Caudate) and Claude-SDK-shaped tool palette.
Project description
Cognos
A local-first cognitive agent with Claude-SDK feature parity, built on Ollama via LiteLLM. Cognos runs entirely on your hardware by default — no API keys, no network calls — but switches to Anthropic, OpenAI, or any LiteLLM-supported provider with a one-line config change.
It's not a thin chat wrapper. The architecture explicitly separates:
- Memory — episodic, semantic, procedural, working
- Planning — DAG-based goal decomposition with replanning
- Reflection — meta-learning from past goal outcomes
- Personality — identity, mood, inner voice
- Dual-process routing — fast/slow models picked per call (System 1 / System 2)
…with a Claude-Code-style agentic loop on top: real-time tool calls, streaming, sessions, hooks, MCP, subagents, permissions, and a fully-featured CLI + HTTP API.
Status: feature-complete against its original five-phase roadmap plus Claude SDK extras and Claude Code UX parity. See
NEXT_ACTIONS.mdfor what's done and what's deferred.
Quickstart
Install from PyPI (recommended)
pipx install caudate-cli
On first launch, cognos runs a one-time setup wizard that picks your
fast/slow models, downloads Caudate's weights from HuggingFace, and writes
~/.cognos/settings.json. After that:
cognos # banner + REPL
cognos doctor # diagnose what's wired (Ollama, Caudate, API keys)
cognos init --force # re-run the wizard if you change your mind
Requirements:
- Python ≥ 3.10
- Ollama running locally if you want the local-only or hybrid preset (skip if you go hosted-only)
- An
ANTHROPIC_API_KEYin your shell only if you pick a preset that uses ananthropic/...model
Install from source (for development)
git clone https://github.com/raveuk/cognos.git
cd cognos
python3 -m venv .venv && source .venv/bin/activate
pip install -e .
cognos init
Talk to it
cognos # default — drops into REPL
cognos interactive --model fast # preset model
cognos interactive \
--system1 ollama/qwen2.5-coder:1.5b \
--system2 ollama/gemma3:27b # explicit dual-brain
A REPL opens. Type to chat. Type /help for slash commands.
4. Or hit it over HTTP
cognos serve --port 8000
# in another terminal:
curl -X POST http://127.0.0.1:8000/chat \
-H 'content-type: application/json' \
-d '{"message":"what is in this directory?"}'
The HTTP server also hosts a Web UI at http://127.0.0.1:8000/ui.
What you get out of the box
CLI
| Command | What it does |
|---|---|
cognos interactive |
REPL with streaming, slash commands, history, multi-line input |
cognos run <goal> |
Single-shot DAG planner — decomposes a goal, runs it, reflects |
cognos talk |
Voice mode (Moonshine STT + Kokoro TTS, Whisper/Piper fallback) |
cognos draw "<prompt>" |
Generate an image (diffusers / FLUX.1-schnell or SDXL-Turbo) |
cognos caudate {train,eval,status,export} |
Train/inspect Caudate, the learned router/advisor NN |
cognos serve [--port 8000] |
FastAPI HTTP server with SSE streaming |
cognos sessions {list,delete,rename,export} |
Manage saved conversations |
cognos personality {show,set,reset} |
Inspect or tune identity / mood |
cognos models |
List detected Ollama models with capability flags |
cognos router |
Preview routing decisions without calling the LLM |
cognos bench |
Run the benchmark suite |
cognos cron {add,list,remove,run} |
Schedule recurring prompts |
cognos mcp-serve |
Run Cognos as an MCP server |
cognos update |
Self-update (git pull or pip upgrade) |
cognos info |
List registered tools and learned strategies |
Slash commands (inside the REPL)
/help, /clear, /compact, /model <id|fast|balanced|powerful>,
/cost, /tools, /sessions, /export <md|json|html>, /files,
/permissions <mode>, /personality, /router, /diff <path>,
/status, /cron, /bg, /notify, /think on|off, /save,
/quit. Type /help for the full list with descriptions.
Tools the agent can call
~38 built-in tools, including: Bash, Read, Write, Edit, Glob,
Grep, WebSearch, WebFetch, PythonExec, Think, Respond,
Agent (subagents), Draw, EditImage, DescribeImage, Speak,
TranscribeAudio, Storyboard, Sandbox, Calculator, DateTime,
HttpRequest, OpenAPI, Notebook, Cron, PushNotification,
AskUserQuestion, LoadSkill, UpdateMemory, MCP, Worktree,
PlanMode, FindAnywhere, SemanticSearch, SystemInfo, Task,
CognosCard, Artifact, Agentic. Drop a plugins/*.py exposing
PLUGIN = ToolInstance to add your own.
Caudate — the learned brain
Cognos ships with Caudate, a small PyTorch transformer that learns your tool-use patterns turn-by-turn. It observes every conversation, auto-trains in the background once it has enough samples, and graduates through trust levels (SILENT → OBSERVER → WHISPER → ADVISOR → CONTROLLER) based on rolling accuracy. At WHISPER it whispers a hint into the LLM prompt; at ADVISOR it can override tier routing.
See CAUDATE.md for the full architecture, nn/ for the code,
data/nn/ for the live checkpoint and replay buffer.
Multi-modal in / out
@filereferences —look at @config.pyinlines or attaches the file.- Drag-and-drop images / PDFs — paths in the prompt are auto-uploaded via the Files API.
POST /files— same Files API exposed over HTTP.- Citations — pass
documents=[{id,title,text}]and the model can emit[[cite:doc:Lx]]markers, post-processed into structuredCitationBlockobjects.
Architecture
┌──────────────────────────────────────────────────┐
│ CognosAgent │
│ │
user input ─┼─► AgenticLoop ◄──► Executor ──► tools/ │
│ │ ▲ │
│ │ │ │
│ ▼ │ │
│ Personality ─► hooks ──┘ │
│ │ │
│ ▼ │
│ LLM Router (DualLLMProvider) │
│ ├── System 1: fast model │
│ └── System 2: slow model │
│ │
│ Memory: episodic | semantic | procedural | working
│ Session persistence + context compaction │
│ Permissions (modes + allow/deny rules + audit) │
│ MCP clients (cognos_mcp/) │
│ Subagents (workspace-isolated via git worktrees)│
└──────────────────────────────────────────────────┘
Each subsystem is documented in BUILD_LOG.md. The
Claude SDK Extras and
Claude Code UX Parity sections
in NEXT_ACTIONS.md enumerate what's wired and where.
Configuration
Three layers, last wins:
- Built-in defaults in
core/settings.py ~/.cognos/settings.json— per-user./.cognos/settings.json— per-project
Example:
{
"model": "ollama/gemma3:27b",
"permission_mode": "default",
"fallback_models": ["ollama/qwen2.5-coder:1.5b"],
"permissions": {
"allow": [{"tool": "Bash", "pattern": "^(ls|cat|grep)"}],
"deny": [{"tool": "Bash", "pattern": "rm -rf"}]
},
"statusline": "{model} | {mood} | tok={tokens} | ${cost:.4f}",
"notifications": {"enabled": true, "on_long_task_seconds": 30}
}
CLI flags always override settings (--model fast, --permissions plan).
Web UI
A zero-build single-page UI ships with the HTTP server:
cognos serve --port 8000
# open http://127.0.0.1:8000/ui
It speaks to POST /chat/stream (SSE), supports session resume,
file attachments, and slash-style commands. Source: ui/web/.
IDE plugins
ide/vscode/— TypeScript extension. Sidebar webview, "Ask about selection" right-click, configurable API URL / model / permission mode.ide/jetbrains/— Kotlin plugin for IntelliJ-platform IDEs (IDEA, PyCharm, GoLand, WebStorm, RustRover, …). Tool window, editor action, settings page.
Both are thin clients — they make HTTP calls to a running cognos serve
process, no LLM runs in the IDE.
Optional extras
pip install-flagged features that are no-ops without their dep:
| Extra | Unlocks |
|---|---|
anthropic |
Real prompt caching, native extended thinking, native response_format for claude-* model ids |
pypdf |
PDF text extraction in the Files API |
prompt_toolkit |
Multi-line input + persistent history + Ctrl+R + slash completion |
fastapi + uvicorn |
The HTTP server (cognos serve) |
mcp |
The MCP server / client (cognos mcp-serve) |
useful-moonshine-onnx + kokoro + piper-tts + sounddevice |
Voice mode (cognos talk) |
diffusers + transformers + torch |
Image generation (cognos draw) |
torch + sentence-transformers |
Caudate (the learned router NN) |
Cognos runs without any of them — they degrade gracefully.
Project layout
core/ agent, agentic loop, sessions, hooks, permissions, files,
citations, settings, slash commands, …
execution/ tool registry + 12 built-in tools + plugin loader
llm/ LiteLLM provider, model registry, dual-process router,
fallback chains
memory/ episodic / semantic / procedural / working
planning/ DAG planner, task graph
reflection/ reflector, meta-learner
personality/ identity, mood, inner voice
cognos_mcp/ MCP server, client, bridge
api/ FastAPI HTTP server
bench/ benchmark suite
plugins/ drop-in tools (`PLUGIN = ToolInstance`)
ide/vscode/ VS Code extension
ide/jetbrains/ JetBrains plugin
ui/ terminal display + web UI
data/ local state — sessions, files, manifests, audit log
Why local-first?
Three reasons:
- Privacy. Code, conversations, and learned strategies live on disk.
- Cost. A small Ollama model runs at $0/turn and answers in milliseconds for routine work.
- Sovereignty. No vendor outage takes you offline; no rate limit slows you down.
The dual-process router exists so you can keep most turns on a small local model and only escalate hard turns to a heavy one (which can itself be local — or Anthropic/OpenAI when you're online).
License
TBD.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file caudate_cli-0.1.0-py3-none-any.whl.
File metadata
- Download URL: caudate_cli-0.1.0-py3-none-any.whl
- Upload date:
- Size: 478.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9a9881ae7f54fce588f13ac08e9ba4a29b1681eaf89da488979fa3ae7c53f442
|
|
| MD5 |
e96f4f456135ef99f9b310f80ebe56d3
|
|
| BLAKE2b-256 |
f79032af194f9f0f14e31c836c63e853fb538be544d12579dda5ea6ba608f658
|