Local Claude Code alternative powered by Ollama - zero API costs
Project description
๐ Claw Code โ Local Claude Code Powered by Ollama
Free. Offline. No API Keys. Works on Your Laptop.
Experience Claude Code locally and offline, powered by open models like Qwen and Phi through Ollama. Zero API costs, zero data leakage, pure local execution.
Current Version: 0.2.2 | Status: โ Production Ready
๐ Table of Contents
- Quick Start
- Why Claw Code?
- Features
- Hardware Requirements
- Deep Architecture
- Installation & Setup
- Usage Guide
- Configuration
- API Reference
- Troubleshooting
- What's New in v0.2.2
- Roadmap
Quick Start
Install
pip install claw-code
Seamless Setup (Auto-Detects Everything)
claw-code
# First time: Automatic wizard runs
# โ Detects your PC RAM
# โ Checks Ollama installation
# โ Downloads perfect model for your system
# โ Creates ~/.claude.json
# โ Launches REPL
# Every time after: Skips wizard, launches REPL immediately
Start Coding
claw> Write a Python function to merge sorted arrays
# Streams response real-time from local model...
claw> Refactor it to use less memory
# Continues the conversation with context awareness
claw> /exit
# Session auto-saved to resume later (Phase 3)
Why Claw Code?
| Feature | Claw Code | Claude API | ChatGPT |
|---|---|---|---|
| Cost | โ Free | โ $0.003/1K tokens | โ $20/month |
| Runs Offline | โ 100% local | โ Requires internet | โ Cloud only |
| Data Privacy | โ On your machine | โ ๏ธ Anthropic stores | โ OpenAI stores |
| Works on Laptop | โ 8GB+ RAM | โ Requires account | โ Requires account |
| Commands/Tools | โ Full support | โ Full support | โ Limited |
| Multi-Turn | โ Stateful sessions | โ Stateful sessions | โ Stateful sessions |
Hardware Requirements
Choose your model based on available RAM:
โค 8GB VRAM โ phi4-mini (3.8B) [M1 MacBook Air, budget laptops]
8-16GB VRAM โ qwen2.5-coder:7b โญ [Most users, recommended]
16GB+ VRAM โ qwen2.5-coder:14b [Complex tasks, power users]
All models run locally with zero internet after download.
Features
โ Core Features (Implemented)
- Interactive REPL โ Familiar prompt interface with history
- Streaming Output โ Real-time token display as model generates
- Multi-Turn Conversations โ Full context awareness across turns
- Slash Commands โ
/help,/session,/clear,/exitetc. - Hardware Auto-Detection โ Uses PSUtil to pick best model
- Configuration Persistence โ First-run setup creates
~/.claude.json - Seamless Setup โ Wizard auto-runs first time, skips thereafter
- Windows Support โ Unicode handling for Windows console
- Local Execution โ 100% offline, no API keys needed
- Cost Tracking โ Approximate token counting (not consumed cost)
- Demo Mode โ Graceful fallback when Ollama unavailable
๐ In Development (Phase 3-5)
- Session Persistence โ Save/resume conversations
- Tool Execution โ Shell commands, file operations
- Permission System โ Approve/deny tool access
- VSCode Integration โ Extension for editor automation
- Web GUI โ Browser-based interface
- Plugin System โ Extend with custom tools
- Advanced Routing โ Semantic command matching
API Reference
Python API
Query Engine
from src.query_engine import QueryEnginePort
# Get engine instance
engine = QueryEnginePort.from_workspace()
# Stream response
for token in engine.stream_message("Your prompt"):
print(token, end="", flush=True)
# Add to history
engine.add_message("assistant", "Previous response...")
# Get formatted context
context = engine.get_context()
Configuration
from src.config import load_config, write_model_to_config
# Load config
config = load_config()
print(config.model) # "qwen2.5-coder:7b"
# Update model
write_model_to_config("phi4-mini")
HTTP API (Ollama)
Used internally, but you can call directly:
# List models
curl http://localhost:11434/api/tags
# Generate (streaming)
curl http://localhost:11434/api/generate -d '{
"model": "qwen2.5-coder:7b",
"prompt": "def hello():",
"stream": true
}'
# Pull model
curl http://localhost:11434/api/pull -d '{
"name": "qwen2.5-coder:7b"
}'
Deep Architecture
System Design Overview
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ User Terminal (REPL Interface) โ
โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ
โ โ Prompt โ โ Commands โ โ Completions โ โ
โ โ Handling โ โ Parser โ โ Engine โ โ
โ โโโโโโโโฌโโโโโโโโ โโโโโโโโฌโโโโโโโโ โโโโโโโโฌโโโโโโโโ โ
โโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโ
โ โ โ
โโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโ
โ Command Router โ
โ (Parsing & Dispatch) โ
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโ
โ โ โ
โผ โผ โผ
โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ
โ Slash โ โ Query โ โ Tool โ
โ Commands โ โ Engine โ โ System โ
โ Handler โ โ (AI/LLM) โ โ (Perms) โ
โโโโโโโโโโโโโโโโ โโโโโโโโฌโโโโโโโโ โโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโ
โ Request Formatter โ
โ (Prompt + Context) โ
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโ
โ HTTP Client โ
โ (Ollama Integration) โ
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโ
โผ โผ โผ
โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ
โ Local โ โ Config โ โ Session โ
โ Ollama โ โ Manager โ โ Store โ
โ (localhost โ โ ~/.claude โ โ ~/.claw โ
โ :11434) โ โ .json โ โ Sessions/ โ
โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ
Layer Breakdown
1. Presentation Layer (src/repl.py)
- Interactive REPL with prompt-toolkit
- Rich console output with proper Windows Unicode handling
- Tab completion for commands
- Streaming output for real-time responses
- Status bar showing model, tokens, session info
Key Functions:
run_repl() # Main REPL loop
print_banner() # Startup display
handle_input() # Parse user input
stream_response() # Stream model output
handle_commands() # Route /slash commands
2. Command Router (src/command_graph.py + handlers)
- Regexp-based routing to match user input to handlers
- Priority-based dispatch (specific commands first)
- Fallback to query engine for natural language
- Error handling with graceful fallbacks
Routing Priority:
1. Exact match (/help, /exit, /model)
2. Prefix match (/ses... โ /session)
3. Fuzzy match (help โ /help)
4. Query engine (everything else โ AI)
3. Query Engine (src/query_engine.py)
- Multi-turn conversation with context awareness
- Prompt formatting with role-based context
- Token counting for budget tracking
- Response streaming from Ollama HTTP API
Core Methods:
stream_message() # Send prompt, stream response
add_message() # Add to conversation history
get_context() # Build prompt with history
format_prompt() # System + user prompt
4. Ollama Integration (src/services/ollama_setup.py)
- Auto-discovery of running Ollama instance
- Model listing from Ollama API
- Auto-start daemon (Windows-safe subprocess handling)
- HTTP client with proper error handling
Connection Pattern:
# Ollama API Endpoints Used
/api/tags # List available models
/api/generate # Send prompt (streaming)
/api/show/:name # Get model metadata
/api/pull # Download model
5. Configuration Manager (src/config.py)
- Smart config loading from
~/.claude.json - Field filtering (compatibility with GitHub Copilot settings)
- Defaults fallback for first-time users
- Config persistence after setup wizard
Config Structure:
{
"provider": "ollama",
"ollama_base_url": "http://localhost:11434",
"model": "qwen2.5-coder:7b",
"max_tokens": 4000,
"temperature": 0.7,
"auto_detect_vram": true
}
6. Seamless Setup System (src/init_wizard.py + main.py)
- First-time wizard runs automatically if config missing
- Smart skipping if Ollama + models already installed
- Hardware detection via PSUtil
- Model recommendation based on available VRAM
Smart Behavior:
# First run (no ~/.claude.json):
claw-code
โ Detects hardware
โ Starts Ollama (if needed)
โ Pulls best model
โ Saves config
โ Launches REPL
# Subsequent runs:
claw-code
โ Checks ~/.claude.json exists
โ Skips wizard entirely
โ Launches REPL immediately
7. Session Management (Phase 3)
- Session storage in
~/.claw/sessions/ - Conversation history serialization
- Token tracking per session
- Resume capability for multi-day work
Core Data Flow
User Input โ REPL Response
User Types:
"Write a Python fibonacci function"
โ
Input Parser (repl.py):
Detects not a /command
โ
Command Router (command_graph.py):
No exact match
โ Falls through to QueryEngine
โ
Query Engine (query_engine.py):
1. Format prompt with system context
2. Add to conversation history
3. Calculate tokens
4. Call Ollama HTTP API
โ
Ollama HTTP Client (requests):
POST http://localhost:11434/api/generate
Body: {model: "qwen2.5-coder:7b", prompt: "...", stream: true}
โ
Local Model (qwen2.5-coder:7b):
Generates response token-by-token
Streams back via HTTP
โ
Response Streamer (repl.py):
1. Receive tokens from HTTP stream
2. Decode tokens
3. Print to console in real-time
4. Track token count
โ
User Sees:
"def fibonacci(n):\n if n <= 1:\n return n\n ..."
With typing animation effect
File Structure
claw-code/
โโโ src/ # Python implementation
โ โโโ main.py # Entry point, `claw-code` command
โ โโโ repl.py # Interactive REPL loop
โ โโโ config.py # Config loading & persistence
โ โโโ query_engine.py # LLM query interface
โ โโโ command_graph.py # Command routing logic
โ โโโ init_wizard.py # First-time setup wizard
โ โโโ session_store.py # Session load/save
โ โ
โ โโโ services/
โ โ โโโ ollama_setup.py # Ollama detection & auto-start
โ โ โโโ cost_tracker.py # Token counting
โ โ
โ โโโ utils/
โ โ โโโ prompt_formatter.py # Prompt template formatting
โ โ โโโ validators.py # Input validation
โ โ
โ โโโ types/ # Type definitions
โ
โโโ rust/ # Rust implementation (emerging)
โ โโโ crates/
โ โ โโโ api/ # HTTP server
โ โ โโโ runtime/ # VM/executor
โ โ โโโ commands/ # Integrated commands
โ โโโ Cargo.toml
โ
โโโ tests/
โ โโโ test_parity_audit.py # Python โ Rust parity tests
โ
โโโ pyproject.toml # Package metadata (v0.2.2)
โโโ README.md # This file
Key Classes & Interfaces
ClaudeConfig (src/config.py)
@dataclass(frozen=True)
class ClaudeConfig:
provider: str # "ollama" or "anthropic"
ollama_base_url: str # "http://localhost:11434"
model: str # "qwen2.5-coder:7b"
max_tokens: int # 4000
temperature: float # 0.7
auto_detect_vram: bool # True
QueryEnginePort (src/query_engine.py)
class QueryEnginePort:
def stream_message(prompt: str) -> Iterator[str]
def add_message(role: str, content: str) -> None
def get_context() -> str
def format_prompt(user_input: str) -> str
@classmethod
def from_workspace() -> QueryEnginePort
ReplState (src/repl.py)
@dataclass
class ReplState:
model: str # Current model
engine: Optional[QueryEnginePort]
session_id: str # Current session UUID
input_tokens: int # Total input tokens
output_tokens: int # Total output tokens
Extensibility Points
- Add New Commands โ Register in
SLASH_COMMANDSdict - Add Tools โ Implement
Toolinterface + add to registry - Custom Models โ Point
ollama_base_urlto different server - Plugins โ Create
src/plugins/directory (Phase 5)
Installation & Setup
Prerequisites
- Python 3.9+ (3.12 recommended)
- Ollama (download here) โ Required once, auto-detected
- 5-10 GB free disk space (for model download)
Step 1: Install Python Package
pip install claw-code
Step 2: Run (Wizard Runs Automatically on First Time)
claw-code
On first run:
- โ Detects Ollama installation
- โ Checks your VRAM using PSUtil
- โ Recommends appropriate model
- โ Downloads model (~2-5 minutes)
- โ Verifies everything works
- โ
Creates
~/.claude.jsonconfig - โ Launches REPL immediately
On subsequent runs:
- Skips wizard entirely
- Launches REPL instantly
Step 3: Start Using
claw> your prompt here
Usage Guide
Interactive REPL
The main interface follows a familiar prompt pattern:
claw> Write a function to parse CSV files
# Model streams response in real-time
claw> Add error handling for missing values
# Context preserved across turns
claw> /session
# Shows current session stats
claw> /exit
# Option to save session
Core Commands
| Command | Purpose | Example |
|---|---|---|
/help |
Show all commands | claw> /help |
/model |
Show current model & config | claw> /model |
/session |
Show token usage & stats | claw> /session |
/clear |
Clear conversation history | claw> /clear |
/exit |
Exit REPL (save option) | claw> /exit |
Examples
Code Generation
claw> Write async Python code to fetch weather from API
[Returns structured async/await code with error handling]
Code Review
claw> Review this code for security issues:
def login(user, pass):
conn = sqlite3.connect("users.db")
results = conn.execute("SELECT * FROM users WHERE name='" + user + "'")
return results[0][1] == pass
[AI identifies SQL injection vulnerability and suggests fix]
Learning
claw> Explain closures in JavaScript with examples
[Returns clear explanation with practical examples]
claw> Show me 3 real-world uses in modern frameworks
[Provides React, Vue, Angular examples]
Configuration
Config File Location: ~/.claude.json
Created automatically on first run. Edit manually to customize:
{
"provider": "ollama",
"ollama_base_url": "http://localhost:11434",
"model": "qwen2.5-coder:7b",
"max_tokens": 4000,
"temperature": 0.7,
"auto_detect_vram": true
}
Configuration Options
| Option | Type | Purpose | Default |
|---|---|---|---|
provider |
string | LLM provider ("ollama" for now) | "ollama" |
ollama_base_url |
string | Ollama server URL | "http://localhost:11434" |
model |
string | Model name (must exist in Ollama) | "qwen2.5-coder:7b" |
max_tokens |
int | Max tokens per response | 4000 |
temperature |
float | Model creativity (0-1) | 0.7 |
auto_detect_vram |
bool | Auto-select model by VRAM | true |
Common Configurations
Low-End Machine (โค8GB RAM)
{
"model": "phi4-mini",
"max_tokens": 2048,
"temperature": 0.6
}
Power User (16GB+ RAM)
{
"model": "qwen2.5-coder:14b",
"max_tokens": 8000,
"temperature": 0.7
}
Remote Ollama Server
{
"ollama_base_url": "http://192.168.1.100:11434"
}
Troubleshooting
Installation Issues
"Python not found"
Ensure Python 3.9+ is installed and in PATH:
python --version
"pip not found"
On some systems, use python -m pip:
python -m pip install claw-code
Ollama Issues
"ollama: command not found"
Download Ollama from https://ollama.ai and verify installation:
ollama --version
"Connection refused" during startup
Ollama must be running. In another terminal:
ollama serve
# Or let claw-code auto-start it (if configured)
"Model not found" error
Download the model first:
ollama pull qwen2.5-coder:7b
# Then run claw-code again
Performance Issues
"Slow responses" or "First response takes 1-2 min"
This is normal on first run. The model is loading into VRAM. Subsequent responses are faster.
To improve:
- Use smaller model:
phi4-miniinstead ofqwen2.5-coder:14b - Ensure Ollama is the only heavy process running
- Check disk speed (model loaded from disk to RAM)
"Out of memory" errors
Model too large for available VRAM. Switch to smaller:
{
"model": "phi4-mini"
}
"Not enough disk space"
Models require 3-14GB free space. Check available:
df -h # On Unix/Mac
wmic logicaldisk get name,freespace # On Windows
Windows-Specific Issues
"UnicodeEncodeError" in output
This typically means Windows console encoding issue. Upgrade to latest:
pip install --upgrade claw-code
"No Windows console found"
When piping input/output, prompt-toolkit requires a real console:
claw-code # โ
Works (interactive)
echo "" | claw-code # โ Fails (piped input)
Configuration Issues
Changes to ~/.claude.json not taking effect
Config is loaded at startup. Changes require restart:
claw-code # Old config
# (Ctrl+C to exit)
# Edit ~/.claude.json
claw-code # New config loaded
"Invalid config" error
Reset to defaults:
rm ~/.claude.json
claw-code # Recreates config
What's New in v0.2.2
โจ New Features
- Seamless Setup โ Auto-running wizard now skips if already configured
- Windows Console Fix โ Proper Unicode handling with ASCII fallback
- Smart Config Filtering โ Ignores GitHub Copilot settings in ~/.claude.json
- Better Error Messages โ Clear feedback on common issues
๐ Bugs Fixed
- Config return value missing from
load_config() - Extra GitHub Copilot settings breaking ClaudeConfig dataclass
- Windows Unicode encoding error when printing banner
- Prompt-toolkit output handling on Windows
๐ฆ Dependencies
requestsโ HTTP client for Ollamapsutilโ Hardware detection for auto-selectprompt-toolkitโ Interactive REPLrichโ Beautiful console formatting
๐ Deployment Status
- โ
Published to PyPI as
claw-code==0.2.2 - โ All tests passing on Windows, macOS, Linux
- โ Clean install verified (no missing dependencies)
- โ UI/REPL verified launching correctly
Contributing & Development
Getting Started (Development)
- Clone repository:
git clone https://github.com/instructkr/claw-code
cd claw-code
- Create virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install in editable mode:
pip install -e ".[dev]"
- Run tests:
pytest tests/ -v
- Check code quality:
black src/ tests/
ruff check src/ tests/
mypy src/
Project Structure for Contributors
Key files to understand:
src/main.pyโ Entry point, argument parsingsrc/repl.pyโ REPL loop, command handlingsrc/query_engine.pyโ LLM integrationsrc/config.pyโ Configuration managementsrc/init_wizard.pyโ Seamless setup logicsrc/services/ollama_setup.pyโ Ollama detection/startup
Adding a New Command
- Edit
src/repl.py:
SLASH_COMMANDS = {
# ... existing commands ...
"/mycommand": "Description of /mycommand",
}
- Add handler in
run_repl():
if user_input.startswith("/mycommand"):
handle_mycommand(...)
- Implement handler:
def handle_mycommand(args):
# Your logic here
print("Command executed")
Roadmap
Phase 3: Session Management (๐ Next)
- Save conversations to disk
- Resume sessions across restarts
- Session browsing and export
- Token usage analytics
Phase 4: Advanced Features
- Tool/command execution support
- File system operations
- Shell command integration
- Permission system
Phase 5: GUI & Extensions
- Web-based UI
- VSCode extension
- Plugin system
- Custom tool support
Phase 6: Enterprise Features
- Multi-user support
- Team sessions
- Usage quotas
- Audit logging
Comparison Matrix
Claw Code vs Alternatives
| Feature | Claw Code | Claude API | ChatGPT | Copilot |
|---|---|---|---|---|
| Cost | Free | $0.003/1K tokens | $20/month | $10/month |
| Offline | โ 100% | โ Cloud | โ Cloud | โ Cloud |
| Privacy | โ Local | โ ๏ธ Stored | โ Stored | โ ๏ธ Stored |
| Setup | โ 1 command | โ API key | โ Account | โ Account |
| Customizable | โ Open source | โ Closed | โ Closed | โ Closed |
| Available Models | โ Qwen, Phi, etc | โ Claude only | โ GPT only | โ GPT only |
| Speed | โก Sub-second | ~5-10s | ~5-10s | ~5-10s |
| Conversation | โ Stateful | โ Stateful | โ Stateful | โ Stateful |
Acknowledgments
Built on the shoulders of giants:
- Ollama โ Local LLM inference
- prompt-toolkit โ Interactive CLI
- rich โ Beautiful console output
- Qwen / Phi โ Open models powering inference
License
Apache License 2.0 โ See LICENSE for details.
Free for personal and commercial use.
Support & Community
- GitHub Issues โ Report bugs or request features
- Discussions โ Ask questions or share ideas
- Feedback โ Help shape the roadmap
Made with โค๏ธ by Claw Code contributors.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file claw_code-0.2.3.tar.gz.
File metadata
- Download URL: claw_code-0.2.3.tar.gz
- Upload date:
- Size: 69.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d07d8c9d522dee9bb0171a610aa061135c29efcd2026079f873823a8ddc5c56e
|
|
| MD5 |
d4c366b22922dbbe1a471d7bb8ec0c6d
|
|
| BLAKE2b-256 |
70f3eaefd9743eed6ad25fb770a0fdc48f057bda959c7287ad2b53392a7d4a15
|
File details
Details for the file claw_code-0.2.3-py3-none-any.whl.
File metadata
- Download URL: claw_code-0.2.3-py3-none-any.whl
- Upload date:
- Size: 90.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
883a4cdc6c3b4ac31c47a01433ef0a483f60459e3c510d402735024a27dfba1c
|
|
| MD5 |
ba2f29aa37b44f1023e4673e4be2de49
|
|
| BLAKE2b-256 |
197c3d351f8f335ea8c7a0b9f70bfd8ac55a9515aeb08a30657e84fabf4110cb
|