Skip to main content

Terminal-native agentic coding runtime for local open-weight models

Project description

Codii — Local Models with Agentic Power

A terminal-native agentic coding runtime that adapts to your model, not the other way around.

PyPI version Python 3.11+ License: MIT


What is Codii?

Codii is a self-contained terminal tool that turns any local or cloud language model into a working coding agent. Point it at an Ollama server, an LM Studio instance, a vLLM deployment, or OpenRouter — Codii probes the model's real capabilities, selects the right scaffolding strategy, and runs a full agentic loop that reads, edits, and executes code in your codebase.

No cloud subscription required for local models. No vendor lock-in.


Why Codii Exists

Agentic coding runtimes were designed around frontier models — large, well-behaved, with native function calling, reliable JSON output, and enormous context windows. Smaller open-weight models don't share those properties, and behavior degrades as a result: the model narrates what it would do instead of doing it. Files aren't created. Commands aren't run. The loop doesn't close.

The cause isn't model quality. It's scaffolding mismatch.

Research consistently shows that varying the scaffolding around a fixed model produces larger performance swings than varying the model inside a fixed scaffold. Codii acts on this finding. Before any agentic action is taken, it probes the connected model across four dimensions, builds a capability fingerprint, and selects a system prompt, tool-call format, and guard-rail configuration tuned to that specific model's measured behavior.

The harness adapts to the model.


Installation

Recommended (isolates codii from your project environments):

pipx install codii

Alternative:

pip install codii

Both produce a codii command available immediately in your terminal.

Python 3.11 or later required.

Optional — serve mode (Anthropic Messages API shim backed by your local model):

pip install "codii[serve]"

Quick Start

First run — interactive backend setup wizard:

codii

Codii auto-detects any locally-running Ollama or LM Studio instance, lists available models, probes the selected one, and drops you into a session.

Specify a backend explicitly:

codii --backend ollama
codii --backend openrouter
codii --backend vllm

Connect to any OpenAI-compatible endpoint:

codii connect http://localhost:8000

Other useful commands:

codii probe                  # Re-run capability probe for the current model
codii decisions              # View the decision log from the last session
codii replay                 # List and replay past session transcripts
codii fingerprint show       # Display the current model fingerprint
codii fingerprint edit       # Open fingerprint in $EDITOR for manual tuning
codii fingerprint list       # List all stored fingerprints
codii serve                  # Start Anthropic Messages API shim (requires [serve])

Key Features

Multi-backend support Ollama (default :11434), LM Studio (default :1234), vLLM, OpenRouter, and any generic OpenAI-compatible endpoint. The first-run wizard auto-detects local backends. OpenRouter setup includes a masked API key prompt.

Automatic capability probing Four sequential probes run before the first session and results are cached per model:

  • Tool-call format detection — sends a test tool definition and classifies the model's raw response format (gemma4_tokens, hermes_xml, qwen_json, mistral_json, openai_json, or none)
  • Effective context window measurement — needle-in-haystack test at 25%, 50%, and 90% of the model's claimed context depth
  • Reasoning token support detection — identifies <think>, <thinking>, <|begin_of_thought|>, and similar delimiters
  • Structured output reliability — five JSON-schema requests scored as a 0–1 reliability fraction

Three-tier adaptive scaffolding System prompt, tool format, and guard rails are selected from the fingerprint. Tiers adapt dynamically within a session on repeated parse failures or successes.

Tier Condition Tool Format
1 Native tool calling AND reliability ≥ 0.9 AND context ≥ 16k tokens Native JSON function definitions
2 Structured output reliability ≥ 0.5 Prompt-engineered XML with inline examples
3 Structured output reliability < 0.5 Guided JSON, one call at a time

Four interactive modes

  • Auto (default) — full tool suite, all guards active
  • Edit — scoped file editing with edit_file discipline enforced
  • Plan — LLM generates a numbered plan; keyboard UI to approve, edit individual steps inline, or reject before execution
  • Chat — read-only Q&A with two-phase workspace indexing

Cycle through modes with Shift+Tab.

Complete tool suite

Tool Description
read_file UTF-8 file read, sandboxed to workspace
write_file Atomic write via temp-file swap (new files only)
edit_file old_strnew_str replacement with two-pass whitespace-tolerant matching
bash Shell execution (PowerShell on Windows, sh on Unix); timeout 60–300s; blocks interactive commands
list_dir Directory listing; skips .git, node_modules, __pycache__, .venv
spawn_subagent Delegates subtasks to built-in researcher / reviewer / planner agents with restricted tool sets

Slash commands with autocomplete dropdown /help /clear /compact /context /init /index /chat /edit /plan /scope /auto /exit

Dynamic Context Block (DCB) An ephemeral user-role message injected before each inner LLM call. It contains the active tool list, a tier-specific format hint, the last 10 session actions, and the current plan state. It is never stored in history and never consumes persistent context.

WorkflowLock A state machine that arms when read_file is called and forces the next tool call to be edit_file or write_file on the same path. Prevents the common drift pattern where a model reads a file and then moves on without making the edit.

Weak Model Bridge Detects when a smaller model writes correct code as a prose text block instead of a tool call, extracts the code, and injects a precision edit_file directive so the edit lands regardless.

Circuit Breaker Detects stuck patterns — the same read-only tool called on the same file repeatedly with no substantive progress — and injects a redirection to break the loop.

Session-wide codebase cache Chat mode caches file contents and a codebase summary per session. Edit mode preloads scoped files into the conversation so the model can call edit_file directly without a redundant read.

Context auto-compaction When conversation history reaches 75% of the effective context window, a summarization pass compresses old turns. A cooldown prevents thrashing after low-value compactions. Also available on demand with /compact.

Decision logging Every guard evaluation, tier adaptation, parse attempt, repair heuristic, and tool execution is recorded to ~/.codii/sessions/<id>/decisions.jsonl. Review with codii decisions.

Session replay Full conversation transcripts are saved per session and can be replayed with codii replay.

Beautiful terminal UI Rich library panels, spinning probe loader, live token counter (used / cap with percentage), keyboard-navigable menus, and masked password input for OpenRouter setup.


Supported Model Families

Family Format Detected Typical Tier
Gemma 4 gemma4_tokens 1
Qwen / QwQ (Qwen2.5, Qwen3) qwen_json 1 or 2
Mistral / Devstral / Mixtral mistral_json 1 or 2
Hermes (NousResearch) hermes_xml 2
LLaMA / LLaMA-3 openai_json 1 or 2
DeepSeek openai_json 1
Phi detected by name 2 or 3
Any OpenAI-compatible native 1
Other / unknown fallback parser 2 or 3

If a model emits an unrecognized format, Codii falls back to the generic parser, which applies multiple repair heuristics before giving up.


Modes

Auto (default)

The full agent loop. All tools available, all guards active. Best for open-ended "implement this feature" or "fix this bug" requests where Codii should determine the steps autonomously.

Chat (/chat)

Read-only. Only read_file and list_dir are available. On first invocation Codii indexes the workspace: reads the most important files, builds a codebase summary, and caches it for the session. Subsequent questions answer from the cache without re-reading. Best for "how does X work" or "where is Y defined" questions.

Edit (/edit)

Scoped file editing. Mention files in your prompt or use /scope to restrict the edit surface. Scoped files are preloaded into the conversation so the model can call edit_file directly without a redundant read. Best for targeted, surgical changes to known files.

Plan (/plan)

Two-phase execution. First, Codii generates a numbered step-by-step plan and presents it in a keyboard-navigable UI — approve, edit individual steps inline, or reject. After approval, execution begins with the active plan shown in the Dynamic Context Block at every step. Best for multi-file refactors or complex tasks where you want to review before any files change.


Slash Commands

Command Description
/help Show available commands
/clear Clear conversation history (preserves session metadata)
/compact Summarize history to reclaim context tokens
/context Show token usage breakdown (used / cap / %)
/init Generate CODII.md — LLM-analyzed project documentation
/index Re-index workspace files for Chat mode
/chat Switch to Chat mode
/edit Switch to Edit mode
/plan Switch to Plan mode
/plan edit Open the current plan for inline step editing
/scope Show or update the current edit scope
/auto Toggle auto-approval (skip per-tool confirmation prompts)
/exit Exit the session

CLI Reference

Command Description
codii Start a session (first-run setup if unconfigured)
codii probe Re-run the capability probe for the current model
codii connect <url> Connect to a new backend, list models, probe
codii decisions Show the decision log from the most recent session
codii replay List and replay past session transcripts
codii fingerprint show Display the current model fingerprint
codii fingerprint edit Open the fingerprint in $EDITOR for manual tuning
codii fingerprint list List all stored fingerprints
codii serve Start Anthropic Messages API shim (requires [serve] extra)

Global flags: --backend, --auto

codii probe additionally accepts: --backend, --endpoint, --model, --verbose


Architecture Overview

codii (CLI)
  │
  ├── BackendAdapter ── Ollama / vLLM / LM Studio / OpenRouter / Generic OpenAI-compat
  │
  ├── Probe Pipeline ── tool_call → context_window → reasoning → structured_output
  │       └── CapabilityFingerprint ── stored in ~/.codii/fingerprints/
  │
  ├── Scaffolding Selector ── picks Tier 1 / 2 / 3 from fingerprint
  │       └── tier{1,2,3}.txt ── system prompts per tier
  │
  ├── Parser Dispatch ── gemma4 / qwen / hermes / mistral / generic
  │
  └── Session (TAOR Loop)
        ├── AgentCore ── Think → Execute → Verify per turn
        │     ├── WorkflowLock ── arm on read_file, force next call to edit/write
        │     ├── Weak Model Bridge ── extract code-as-text → inject edit_file
        │     ├── Circuit Breaker ── detect and interrupt stuck patterns
        │     └── ContextInjector ── Dynamic Context Block (ephemeral, not stored)
        │
        ├── Session State
        │     ├── conversation history
        │     ├── reads_this_session (read-before-write guard)
        │     ├── action_log (last 20 entries)
        │     ├── file_cache (chat indexing + edit preloading)
        │     └── PlanState (step index + DCB rendering)
        │
        └── Tools
              read_file / write_file / edit_file / bash / list_dir / spawn_subagent

Everything is sandboxed to the workspace directory. Tools reject paths that escape via symlinks or .. traversal. Sensitive paths (.env, .ssh, .aws, .gnupg, credentials) are blocked regardless of workspace location.


Project Status

v0.1.0 — current release.

The probe pipeline, fingerprint system, tier-based scaffolding, and agentic core are stable. All four modes (auto, chat, edit, plan), the full slash command set, and the terminal UI are implemented and working.

With capable models (31B+ quantized, or cloud models via OpenRouter): the agent loop is reliable. WorkflowLock, the Circuit Breaker, and the Dynamic Context Block collectively keep the model on track through long multi-step tasks.

With smaller models (7B, 2B): models generally produce correct code but can struggle with consistent tool-call formatting. The Weak Model Bridge handles the most common failure mode (code written as a text block) but doesn't resolve every formatting failure. Tier 2 and Tier 3 scaffolding improve reliability significantly for these models.

Active development continues. The public API is not yet stable.


Contributing

git clone <repo-url>
cd codii
pip install -e ".[dev]"
pytest                           # run tests
ruff check src/ tests/           # lint
ruff format src/                 # format
mypy --strict src/codii          # type check

Open an issue to discuss large changes before submitting a PR. If a PR changes behavior described in SYSTEM_DESIGN_DOCUMENT.md, update the document in the same commit.


No Telemetry

Codii makes no outbound requests except to your configured endpoint. No usage data, no error reporting, no analytics. For local backends, all traffic stays on your machine. Verifiable by reading src/codii/connection/ — the only HTTP client in the project.


Local models, full agentic coding, no subscription.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codii-0.1.0.tar.gz (98.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

codii-0.1.0-py3-none-any.whl (112.1 kB view details)

Uploaded Python 3

File details

Details for the file codii-0.1.0.tar.gz.

File metadata

  • Download URL: codii-0.1.0.tar.gz
  • Upload date:
  • Size: 98.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for codii-0.1.0.tar.gz
Algorithm Hash digest
SHA256 964625fee70c386ab0e500d8c7a75160b8bd93ff4409893f921426000955058d
MD5 2807ec2c75548c3d0a147d4796472768
BLAKE2b-256 d73fb0a2b9fffbf4d2ca03183bbc8e6c187fd497974a0ba4b45e6f3037460878

See more details on using hashes here.

File details

Details for the file codii-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: codii-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 112.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for codii-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 268b444156b12d0ddf6256d773a18c6334db2754bc286202458facd218d0f34f
MD5 c9d96760bddabe6ef6c7d01d6eaee187
BLAKE2b-256 0465664ab0138c9b79c60cc00b5a45b8b42bbc44248d785e704f303d516f8572

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page