Skip to main content

CASS — A CLI coding assistant powered by LLMs. Works with OpenAI, Ollama, and OpenRouter.

Project description

CASS — Coding ASSistant

A CLI coding assistant powered by LLMs. Works with OpenAI, Ollama, and OpenRouter.

 ██████╗ █████╗ ███████╗███████╗
██╔════╝██╔══██╗██╔════╝██╔════╝
██║     ███████║███████╗███████╗
██║     ██╔══██║╚════██║╚════██║
╚██████╗██║  ██║███████║███████║
 ╚═════╝╚═╝  ╚═╝╚══════╝╚══════╝

Features

  • 8 built-in tools — read, write, edit files, grep search, list directories, run shell commands, manage todos, spawn sub-agents
  • Diff & undo/diff shows all file changes this session, /undo reverts the last change
  • MCP support — connect to any MCP server for external tools (web fetch, GitHub, databases, etc.)
  • Sub-agents — spawn focused LLM instances for research or independent tasks, optionally with a different model
  • Tool approval system — every tool call requires user confirmation before executing
  • Plan mode — propose changes without executing, review the plan, then execute with one keystroke
  • Todos/planning — track multi-step tasks within and across sessions; the LLM manages them as it works
  • Hooks — auto-run formatters and linters after file changes (ruff, prettier, etc.)
  • Vision support — paste screenshots with Alt+V, analyzed by a dedicated vision model
  • Context compression — automatic conversation summarization when approaching token limits
  • Task scheduler — run recurring or one-shot tasks in the background (shell commands or LLM prompts)
  • Custom commands — create reusable shortcuts for common workflows
  • Conversation save/load — persist and resume sessions
  • Multi-line input — Enter submits, backslash continues, Alt+Enter for newlines
  • Input history & autocomplete — arrow keys for history, Tab for slash command completion
  • Ollama model browser — list local models filtered by tool support, switch mid-session

Quick Start

Prerequisites

  • Python 3.12+
  • uv package manager

Install

git clone <repo-url>
cd coding_assistant_v2
uv sync

Configure

Create a .env file in the project root:

For Ollama (local):

OPENAI_API_KEY=ollama
OPENAI_BASE_URL=http://localhost:11434/v1
MODEL_NAME=qwen2.5-coder
VISION_MODEL=llava:latest

For OpenAI:

OPENAI_API_KEY=sk-your-key-here
MODEL_NAME=gpt-4o-mini

For OpenRouter:

OPENAI_API_KEY=sk-or-your-key-here
OPENAI_BASE_URL=https://openrouter.ai/api/v1
MODEL_NAME=openai/gpt-4o-mini

Run

uv run cass

Usage

Slash Commands

Command Description
/help List all commands
/plan Switch to plan mode (read-only, proposes changes)
/active Switch to active mode (all tools enabled)
/models List local Ollama models filtered by tool support
/model [name|number] Show or switch model
/image <path> Attach an image to your next message
/mcp Manage MCP servers (connect, disconnect, load)
/agent Spawn a sub-agent (research or task mode)
/todos Manage todos (add, start, done, rm, save, load)
/command Manage custom commands (add, rm)
/schedule Schedule a background task
/tasks List scheduled tasks (rm, clear)
/tokens Show token usage stats
/compress Force conversation compression
/save [name] Save conversation
/load <name|number> Load a saved conversation
/conversations List saved conversations
/diff Show diff of file changes this session (/diff 3 for last 3)
/undo Undo the last file change
/hooks View hooks config (/hooks init for defaults)
/clear Reset conversation history
/history Show message counts
/exit Quit

Keyboard Shortcuts

Key Action
Enter Submit input
\ + Enter Continue to next line
Alt+Enter Add a newline
Alt+V Paste image from clipboard
Up/Down Cycle input history
Tab Autocomplete slash commands
Ctrl+C Cancel current input
Ctrl+D Exit

Plan Mode

Plan mode lets you discuss changes before making them:

  1. Type /plan to enter plan mode (prompt turns yellow)
  2. Describe what you want — the LLM reads files and proposes changes
  3. When the plan is ready, you're prompted: "Execute this plan? (Y/n)"
  4. Press Enter to auto-switch to active mode and execute

Todos / Planning

Track multi-step tasks within a session. The LLM can manage todos automatically as it works (creating steps, marking them in progress, checking them off).

# User commands
/todos                    # List all todos
/todos add Set up database   # Add a todo
/todos start abc123       # Mark as in progress
/todos done abc123        # Mark as complete
/todos rm abc123          # Remove a todo
/todos clear              # Remove completed items
/todos save               # Save to .cass/todos.json
/todos load               # Load from file

The LLM also has a todos tool and will use it to track its progress on multi-step tasks:

> refactor the auth module into separate files

The LLM will:
1. Create todos for each step
2. Mark each as in_progress as it works
3. Check them off when done

Sub-Agents

Spawn independent LLM instances for focused work. Sub-agents run in their own conversation context and return results to the main session.

Two modes:

  • Research (default) — read-only tools (read_file, list_dir, grep). Safe for exploration.
  • Task — full tool access (write, edit, shell). Can make changes independently.
# Via slash command
/agent find all error handling patterns in the codebase
/agent task refactor the config module to use pydantic
/agent research how does the streaming work --model llama3

# The LLM can also spawn agents on its own via the sub_agent tool
# when it decides a task needs focused work

Use cases:

  • Ask the LLM to "investigate how auth works" — it spawns a research agent to read files and search code, then uses the findings to answer
  • Tell it to "refactor module X" — it can spawn a task agent to do the work independently
  • Use a smaller/faster model for simple exploration: /agent find all TODO comments --model llama3.2:1b

Sub-agent results are automatically added to the main conversation context so the LLM can reference them.

Diff & Undo

Track and revert file changes made during a session:

/diff           # Show unified diff of all changes this session
/diff 3         # Show only the last 3 changes
/undo           # Revert the last file change (can undo multiple times)
  • Modified files: /undo restores the previous content
  • New files: /undo deletes them
  • Diff output is syntax-highlighted
  • Summary shows change count: "4 changes across 2 file(s) (1 created, 3 modified)"

MCP Servers

Connect to external MCP (Model Context Protocol) servers to extend CASS with additional tools — web fetching, GitHub, databases, and more.

# Create default config with example servers
/mcp init

# Connect servers from .cass/mcp.json
/mcp load

# Connect ad-hoc (no config file needed)
/mcp connect web npx -y @anthropic-ai/mcp-server-fetch
/mcp connect github npx -y @modelcontextprotocol/server-github

# List connected servers and their tools
/mcp

# Disconnect
/mcp disconnect web

Configuration (.cass/mcp.json):

{
    "servers": {
        "fetch": {
            "command": "npx",
            "args": ["-y", "@anthropic-ai/mcp-server-fetch"],
            "env": {}
        },
        "github": {
            "command": "npx",
            "args": ["-y", "@modelcontextprotocol/server-github"],
            "env": {"GITHUB_TOKEN": "your-token"}
        }
    }
}

Servers configured in .cass/mcp.json auto-connect at startup. MCP tools appear alongside built-in tools with an mcp_<server>_ prefix and go through the same approval system.

Custom Commands

Create reusable shortcuts for common workflows:

# Create commands
/command add /walk schedule every 1h prompt remind me to get up and walk
/command add /test schedule every 5m shell uv run pytest
/command add /fmt shell ruff format src/
/command add /hello prompt say hello in a creative way

# Use them
/walk              # Starts the hourly walk reminder
/test              # Starts continuous testing
/fmt               # Formats all Python files
/hello             # Gets a creative greeting

# Manage
/command           # List all custom commands
/command rm /walk  # Remove a command

Commands are saved in .cass/commands.json and persist across sessions. Three action types:

  • schedule — runs /schedule with the given args
  • shell — runs a shell command directly
  • prompt — sends text to the LLM

Hooks

Auto-run formatters and linters after file writes/edits:

# Create default hooks (ruff format + lint for Python files)
/hooks init

This creates .cass/hooks.json:

{
    "after_tool": {
        "write_file": [
            {"command": "ruff format {path}", "name": "format", "glob": "*.py"},
            {"command": "ruff check --fix {path}", "name": "lint", "glob": "*.py"}
        ],
        "edit_file": [
            {"command": "ruff format {path}", "name": "format", "glob": "*.py"},
            {"command": "ruff check --fix {path}", "name": "lint", "glob": "*.py"}
        ]
    }
}

Hook failures are fed back to the LLM so it can automatically fix issues.

Task Scheduler

Run background tasks on an interval or after a delay:

# Recurring tasks
/schedule every 5m shell uv run pytest
/schedule every 1h prompt summarize my recent git changes
/schedule every 30 seconds shell echo ping

# One-shot delayed tasks
/schedule in 10m shell uv run pytest
/schedule in 30s prompt remind me to commit

# Manage tasks
/tasks              # List all with status
/tasks rm <id>      # Remove one
/tasks clear        # Remove all

Intervals support compact (5m, 30s, 1h) or word (5 minutes, 30 seconds, 1 hour) formats.

Use cases:

  • /schedule every 5m shell uv run pytest — continuous test runner
  • /schedule every 10m shell git diff --stat — watch uncommitted changes
  • /schedule every 1h prompt look at my git log and summarize what I've accomplished
  • /schedule in 30m prompt remind me to commit and push

Vision Support

Attach images for analysis (requires a vision-capable model):

# Set a vision model in .env
VISION_MODEL=llava:latest

# Then in CASS:
# 1. Press Alt+V to paste from clipboard, or:
/image screenshot.png

# 2. Type your question and press Enter
what's wrong with this code?

When VISION_MODEL is set, images are routed to the vision model for analysis, and the description is passed to your main coding model. This means your coding model doesn't need vision support.

Project Context (CASS.md)

Create a CASS.md file in your project root to give the LLM project-specific context:

# My Project
This is a Django REST API. Uses PostgreSQL, Celery for async tasks.
Key files: src/api/views.py, src/models.py

This is loaded into the system prompt automatically at startup. Warns at 12KB, truncates at 16KB.

Context Compression

Long sessions with lots of tool calls can approach the context window limit. CASS handles this automatically:

  • After each response, checks if context exceeds 70% of the window (128K tokens)
  • Summarizes older messages using the LLM, keeps the 6 most recent messages intact
  • Can also be triggered manually with /compress

Conversation Save/Load

/save my-session          # Save with a name
/save                     # Save with auto-generated timestamp name
/conversations            # List all saved conversations
/load my-session          # Load by name
/load 1                   # Load by number from list

Conversations are stored in .cass/conversations/ (project-local, gitignored).

Development

# Run tests (133 tests)
uv run pytest tests/ -v

# Run the CLI
uv run cass

VS Code launch configs included for debugging (Run CASS, Run Tests, Run Tests current file).

Tech Stack

  • Python 3.12+ with async/await
  • openai — AsyncOpenAI client (works with any OpenAI-compatible API)
  • rich — Terminal UI, markdown rendering, streaming display
  • prompt_toolkit — Multi-line input, history, autocomplete
  • Pillow — Clipboard image capture for vision support
  • ruff — Default formatter/linter for hooks (dev dependency)

Project-local Data (.cass/)

All automatically gitignored:

  • .cass/conversations/ — saved conversation sessions
  • .cass/hooks.json — hook configuration
  • .cass/commands.json — custom command definitions
  • .cass/todos.json — persisted todos
  • .cass/history — input history

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cass_ai-0.1.1.tar.gz (137.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cass_ai-0.1.1-py3-none-any.whl (54.5 kB view details)

Uploaded Python 3

File details

Details for the file cass_ai-0.1.1.tar.gz.

File metadata

  • Download URL: cass_ai-0.1.1.tar.gz
  • Upload date:
  • Size: 137.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for cass_ai-0.1.1.tar.gz
Algorithm Hash digest
SHA256 ff35ba6e4b77e3ad1425be36a2a1ef5756d0a6554f7d7f845abe6248573dbbd9
MD5 f9d4b7b428ab62ad94089c9adc7d9c8a
BLAKE2b-256 de8080e27849cc8bec82f83b910476e63feeb42736b93bab7771ec2086aca1b9

See more details on using hashes here.

File details

Details for the file cass_ai-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: cass_ai-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 54.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for cass_ai-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d8aa7306af2a498481fff853b130ade1d8d504bbe1971e382ec8ed61b3e72b88
MD5 99394b4dec9fd8394c47f33b2b6c6abf
BLAKE2b-256 1aae2ac417440822b82de8a5286806353dcdb49c9a68376e1a89182bd812f78b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page