A minimal, hackable agentic framework for Ollama and BitNet - local-first AI agent toolkit
Project description
๐ฆ LocalClaw R03
A minimal, hackable agentic framework engineered to run entirely locally with Ollama or BitNet.
Inspired by the architecture of OpenClaw, rebuilt from scratch for local-first operation.
Architecture
localclaw/
โโโ core/
โ โโโ ollama_client.py # Zero-dependency HTTP wrapper (stdlib urllib only)
โ โโโ tools.py # Decorator-based tool registry + JSON schema generation
โ โโโ memory.py # Sliding-window conversation memory with summarization
โ โโโ agent.py # ReAct loop โ native tool-call + text-fallback modes
โ โโโ orchestrator.py # Multi-agent routing (router / pipeline / parallel)
โโโ skills/
โ โโโ loader.py # Agent Skills specification loader (progressive disclosure)
โ โโโ skill-creator/ # OpenClaw skill-creator for generating new skills
โ โโโ acp/ # ACP (Agent Control Panel) skill
โ โโโ datetime/ # Datetime utilities skill
โ โโโ web_search/ # Web search skill
โโโ tools/
โ โโโ builtins.py # Ready-to-use tools: calculator, shell, file I/O, HTTP, REPL
โโโ bitnet_client.py # R03: BitNet backend client (Microsoft 1.58-bit quantization)
โโโ bitnet_setup.py # R03: BitNet setup/compilation helper
โโโ acp_plugin.py # ACP integration for activity tracking and A2A messaging
โโโ model_discovery.py # R03: Dynamic model discovery for both backends
โโโ examples/
โโโ 01_basic_agent.py # Simple Q&A demo
โโโ 02_tool_agent.py # Tool calling demo
โโโ 03_orchestrator.py # Multi-agent routing demo
โโโ 04_comprehensive_test.py # Full test suite (supports BitNet)
โโโ 04_comprehensive_test_acp.py # ACP-tracked version
โโโ 05_tool_tests.py # Tool-specific tests
โโโ 06_interactive_chat.py # Interactive CLI chat
โโโ 07_model_comparison.py # Compare models on 15 tests (3 per category)
โโโ 07_model_comparison_acp.py # ACP-tracked version with model logging
โโโ 08_robust_comparison.py # Progress-saving comparison for unstable connections
โโโ 08_robust_comparison_acp.py # ACP-tracked version with resumability
โโโ 09_expanded_benchmark.py # 25 tests across 8 categories
โโโ 10_skills_demo.py # Agent Skills system demo
โโโ 11_skill_creator_test.py # Skill creation benchmark across models
Test Scripts
test.sh # Bash: Run all 11 examples (Linux/macOS/Colab)
test-quick.sh # Bash: Run 7 quick tests (skips benchmarks)
run.sh # Bash: Interactive menu for single example
test-bitnet.sh # Bash: Run BitNet benchmark tests
test.cmd # Batch: Run all 11 examples (Windows)
test-quick.cmd # Batch: Run 7 quick tests (Windows)
run.cmd # Batch: Interactive menu for single example (Windows)
test-bitnet.cmd # Batch: Run BitNet benchmark tests (Windows)
Core design decisions
| Concern | Approach |
|---|---|
| HTTP Client | Zero external dependencies โ uses Python stdlib urllib only |
| Backends | Ollama (default) or BitNet (R03) โ switch via --backend flag |
| Tool calling | Native Ollama tool-call protocol when supported; automatic ReAct text-parsing fallback for other models |
| Memory | Sliding window โ older turns are archived and optionally compressed via LLM summarization |
| Tools | Decorator-based, auto-generates JSON schemas from Python type hints |
| Orchestration | Router (LLM picks agent), Pipeline (chain), or Parallel (concurrent + merge) |
| Streaming | First-class via generator interface |
| Error handling | Automatic retry with exponential backoff for transient network/server errors |
| Security | Path validation, command blocklist, SSRF protection (R03) |
Installation
From PyPI (Recommended)
pip install localclaw
# Or install from GitHub for the latest development version:
pip install git+https://github.com/VTSTech/LocalClaw.git
From Source
# Clone the repository
git clone https://github.com/VTSTech/LocalClaw.git
cd LocalClaw
# Install in development mode
pip install -e .
No Installation Required
LocalClaw uses only Python stdlib โ no dependencies! You can also just copy the localclaw directory into your project:
# Just copy and use
cp -r localclaw /path/to/your/project/
Setup Ollama
# Make sure Ollama is running:
ollama serve
# Pull a model:
ollama pull qwen2.5-coder:0.5b-instruct-q4_k_m
Usage After Installation
# Use the CLI command
localclaw chat --model llama3.1:8b
# Or use as a module
python -m localclaw chat --model llama3.1:8b
# Or in Python code
from localclaw import Agent
agent = Agent(model="llama3.1:8b")
BitNet Backend (R03)
LocalClaw supports Microsoft's BitNet for 1.58-bit ternary weight models โ highly efficient CPU inference.
Supported Models
| Model | Size | HuggingFace Repo |
|---|---|---|
| BitNet-b1.58-2B-4T | ~0.4 GB | microsoft/BitNet-b1.58-2B-4T |
| Falcon3-1B-Instruct | ~1 GB | tiiuae/Falcon3-1B-Instruct-1.58bit |
| Falcon3-3B-Instruct | ~3 GB | tiiuae/Falcon3-3B-Instruct-1.58bit |
| Falcon3-7B-Instruct | ~7 GB | tiiuae/Falcon3-7B-Instruct-1.58bit |
| Falcon3-10B-Instruct | ~10 GB | tiiuae/Falcon3-10B-Instruct-1.58bit |
Setup (One Command with huggingface-cli)
BitNet's setup_env.py handles everything: download, convert to GGUF, quantize, and compile kernels.
# Clone BitNet
git clone --recursive https://github.com/microsoft/BitNet.git
cd BitNet
pip install -r requirements.txt
# Download, convert, and prepare a model (choose one):
python setup_env.py --hf-repo microsoft/BitNet-b1.58-2B-4T -q i2_s # Recommended
python setup_env.py --hf-repo tiiuae/Falcon3-1B-Instruct-1.58bit -q i2_s # Smallest Falcon
python setup_env.py --hf-repo tiiuae/Falcon3-3B-Instruct-1.58bit -q i2_s # Best balance
python setup_env.py --hf-repo tiiuae/Falcon3-7B-Instruct-1.58bit -q i2_s # Most capable
This automatically:
- Downloads the model from HuggingFace (safetensors format)
- Converts to GGUF format
- Quantizes to
i2_s(1.58-bit ternary) - Compiles optimized CPU kernels
Manual Download (wget)
If you prefer not to use huggingface-cli, download directly with wget:
# Create model directory
mkdir -p models/Falcon3-1B-Instruct-1.58bit
cd models/Falcon3-1B-Instruct-1.58bit
# Download model files (~1.3GB for 1B, ~3.2GB for 3B, ~7.5GB for 7B)
wget https://huggingface.co/tiiuae/Falcon3-1B-Instruct-1.58bit/resolve/main/model.safetensors
wget https://huggingface.co/tiiuae/Falcon3-1B-Instruct-1.58bit/resolve/main/config.json
wget https://huggingface.co/tiiuae/Falcon3-1B-Instruct-1.58bit/resolve/main/tokenizer.json
wget https://huggingface.co/tiiuae/Falcon3-1B-Instruct-1.58bit/resolve/main/tokenizer_config.json
wget https://huggingface.co/tiiuae/Falcon3-1B-Instruct-1.58bit/resolve/main/special_tokens_map.json
wget https://huggingface.co/tiiuae/Falcon3-1B-Instruct-1.58bit/resolve/main/generation_config.json
# Or for BitNet-b1.58-2B-4T (~400MB):
mkdir -p models/BitNet-b1.58-2B-4T
cd models/BitNet-b1.58-2B-4T
wget https://huggingface.co/microsoft/BitNet-b1.58-2B-4T/resolve/main/model.safetensors
wget https://huggingface.co/microsoft/BitNet-b1.58-2B-4T/resolve/main/config.json
wget https://huggingface.co/microsoft/BitNet-b1.58-2B-4T/resolve/main/tokenizer.json
wget https://huggingface.co/microsoft/BitNet-b1.58-2B-4T/resolve/main/tokenizer_config.json
Then run setup_env.py pointing to your downloaded model:
cd ../.. # Back to BitNet root
python setup_env.py --model-dir models/Falcon3-1B-Instruct-1.58bit -q i2_s
Model File Sizes
| Model | model.safetensors | Total Download |
|---|---|---|
| Falcon3-1B-Instruct | ~1.3 GB | ~1.4 GB |
| Falcon3-3B-Instruct | ~3.2 GB | ~3.4 GB |
| Falcon3-7B-Instruct | ~7.5 GB | ~7.8 GB |
| BitNet-b1.58-2B-4T | ~400 MB | ~500 MB |
Start the Server
# Start BitNet server (separate terminal)
./build/bin/llama-server -m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf
# Or for Falcon models:
./build/bin/llama-server -m models/Falcon3-1B-Instruct-1.58bit/ggml-model-i2_s.gguf
Use with LocalClaw
# Set BitNet URL (default: http://localhost:8080)
export BITNET_BASE_URL=http://localhost:8080
# Chat with BitNet backend
localclaw chat --backend bitnet --force-react
# With tools
localclaw chat --backend bitnet --force-react --tools calculator,shell
Note: BitNet models require
--force-reactas they don't support native tool calling.
Colab Quick Start
# Cell 1: Setup BitNet with Falcon3-1B (fastest option)
!git clone --recursive https://github.com/microsoft/BitNet.git
%cd BitNet
!pip install -r requirements.txt
!python setup_env.py --hf-repo tiiuae/Falcon3-1B-Instruct-1.58bit -q i2_s
# Cell 2: Start server in background
import subprocess, time
server = subprocess.Popen(
['./build/bin/llama-server', '-m', 'models/Falcon3-1B-Instruct-1.58bit/ggml-model-i2_s.gguf', '--port', '8080'],
stdout=subprocess.PIPE, stderr=subprocess.PIPE
)
time.sleep(5) # Wait for server startup
# Cell 3: Clone and run LocalClaw
%cd /content
!git clone https://github.com/VTSTech/LocalClaw.git
%cd LocalClaw
!localclaw chat --backend bitnet --force-react
Model Comparison
| Model | Speed | Quality | Best For |
|---|---|---|---|
| BitNet-b1.58-2B-4T | โกโกโก | Good | Quick tasks, testing |
| Falcon3-1B-Instruct | โกโกโก | Good | Fastest inference |
| Falcon3-3B-Instruct | โกโก | Better | Balanced performance |
| Falcon3-7B-Instruct | โก | Best | Complex reasoning |
BitNet Benchmark Results: BitNet-b1.58-2B-4T achieved 87% on the LocalClaw benchmark โ see BitNet Benchmark Results section below.
Quick start
1. Single prompt
# Simple Q&A
localclaw run "What is the capital of Japan?"
# With streaming output
localclaw run "Tell me a joke." --stream
# Specify a model
localclaw run "Explain quantum computing" -m llama3.2:3b
2. Interactive chat
# Start interactive session
localclaw chat -m qwen2.5-coder:0.5b
# With tools enabled
localclaw chat -m llama3.1:8b --tools calculator,shell,read_file,write_file
# With skills loaded
localclaw chat -m llama3.2:3b --skills skill-creator --tools write_file,shell
# Fast mode (reduced context for speed)
localclaw chat -m qwen2.5-coder:0.5b --fast --verbose
3. Using BitNet backend
# BitNet requires --force-react for tool support
localclaw chat --backend bitnet --force-react
# Run single prompt with BitNet
localclaw run "Calculate 17 * 23" --backend bitnet --tools calculator
4. With ACP tracking
# Enable ACP for activity monitoring
localclaw chat -m qwen2.5-coder:0.5b --acp --tools shell,read_file,write_file
# Single prompt with ACP
localclaw run "What is 2+2?" --acp
CLI Commands
| Command | Description |
|---|---|
run "prompt" |
Run single prompt and exit |
chat |
Interactive multi-turn conversation |
models |
List available Ollama models |
tools |
List built-in tools |
skills |
List available Agent Skills |
CLI Flags
| Flag | Description |
|---|---|
-m, --model |
Model name (default: qwen2.5-coder:0.5b) |
--tools |
Comma-separated tool list |
--skills |
Comma-separated skill list |
--backend |
ollama or bitnet |
--force-react |
Force ReAct text parsing |
--acp |
Enable ACP integration |
-v, --verbose |
Show tool calls and timing |
--debug |
Show detailed debug info |
--fast |
Preset: reduced context for speed |
--warmup |
Pre-load model before chat |
--stream |
Stream output token-by-token |
--temperature |
Sampling temperature (0.0-2.0) |
--num-ctx |
Context window size |
--num-predict |
Max output tokens |
Interactive Commands (in chat)
| Command | Description |
|---|---|
/help |
Show available commands |
/status |
Show session status |
/tools |
List active tools |
/skills |
List active skills |
/reset |
Clear conversation history |
/undo |
Remove last exchange |
/retry |
Retry last message |
/a2a |
Process pending A2A messages |
/export |
Export to markdown |
exit |
End session |
Built-in Tools
| Tool | Description |
|---|---|
calculator |
Evaluate math expressions |
python_repl |
Execute Python code |
shell |
Run shell commands |
read_file |
Read file contents |
write_file |
Write content to file |
list_directory |
List directory contents |
http_get |
HTTP GET request |
save_note |
Save a note to memory |
get_note |
Retrieve saved notes |
# List all tools
localclaw tools
# Use specific tools
localclaw chat --tools calculator,python_repl,shell
Built-in Skills
| Skill | Description |
|---|---|
skill-creator |
Generate new Agent Skills from requests |
datetime |
Date/time formatting and calculations |
web_search |
Web search capabilities |
# List all skills
localclaw skills
# Use skills in chat
localclaw chat --skills skill-creator --tools write_file
Supported models (tool-calling)
The following model families support native tool calling in Ollama and are auto-detected:
Meta Llama: llama3, llama3.1, llama3.2, llama3.3, llama3-groq-tool-use
Mistral AI: mistral, mixtral, mistral-nemo, mistral-small, mistral-large, codestral, ministral
Alibaba Qwen: qwen2, qwen2.5, qwen3, qwen35, qwen2.5-coder, qwen2-math
Cohere: command-r, command-r7b
DeepSeek: deepseek, deepseek-coder, deepseek-v2, deepseek-v3
Microsoft Phi: phi-3, phi3, phi-4
Google Gemma: functiongemma (designed for function calling)
Others: yi-, yi1.5, internlm2, internlm2.5, solar, glm4, chatglm, firefunction, hermes, nemotron, cogito, athene
All other models fall back to ReAct text-parsing automatically.
Tested Small Models (โค1.5B parameters)
The following models have been tested with a 15-test benchmark (3 tests per category: Math, Reasoning, Knowledge, Calc Tool, Code). Prompts are optimized for small model comprehension.
Rankings (Updated)
| Rank | Model | Score | Time | Math | Reason | Know | Calc | Code |
|---|---|---|---|---|---|---|---|---|
| ๐ฅ | qwen2.5-coder:0.5b-instruct-q4_k_m |
14/15 (93%) | ~80s | 3/3 | 2/3 | 2/3 | 3/3 | 3/3 |
| ๐ฅ | BitNet-b1.58-2B-4T (BitNet) |
13/15 (87%) | ~394s | 3/3 | 2/3 | 2/3 | 3/3 | 3/3 |
| ๐ฅ | granite3.1-moe:1b |
12/15 (80%) | ~60s | 3/3 | 2/3 | 3/3 | 1/3 | 3/3 |
| 4 | llama3.2:1b |
12/15 (80%) | ~600s | 3/3 | 1/3 | 2/3 | 3/3 | 3/3 |
| 5 | gemma3:270m |
10/15 (67%) | ~75s | 3/3 | 1/3 | 1/3 | 2/3 | 3/3 |
| 6 | qwen3:0.6b |
~9/12 | ~130s | 2/3 | 3/3 | 3/3 | 0/3 | โ |
| 7 | granite4:350m |
8/15 (53%) | ~97s | 2/3 | 1/3 | 2/3 | 0/3 | 3/3 |
| 8 | qwen2.5:0.5b |
10/15 (67%) | ~107s | 1/3 | 3/3 | 3/3 | 0/3 | 3/3 |
| 9 | qwen2-math:1.5b |
12/15 (80%) | ~611s | 3/3 | 3/3 | 3/3 | โ | 3/3 |
| 10 | tinyllama:latest |
9/15 (60%) | ~587s | 2/3 | 2/3 | 3/3 | 0/3 | 2/3 |
| 11 | smollm:135m |
7/15 (47%) | ~285s | 0/3 | 2/3 | 2/3 | 0/3 | 3/3 |
| 12 | functiongemma:270m |
1/15 (7%) | ~90s | 0/3 | 0/3 | 0/3 | 0/3 | 1/3 |
Note: Scores vary between runs due to model non-determinism. The
qwen2.5-coder:0.5bachieved 100% in some runs.
Model Details
| Model | Params | Size | Speed | Tool Support | Notes |
|---|---|---|---|---|---|
qwen2.5-coder:0.5b |
494M | ~400MB | โก Fast | โ Native | ๐ Best overall! Excellent tool usage |
BitNet-b1.58-2B-4T |
2B | ~1.3GB | โก Medium | โ ๏ธ ReAct | ๐ฅ 2nd place! CPU-efficient ternary weights |
granite3.1-moe:1b |
1B MoE | ~1.4GB | โก Medium | โ Native | Strong knowledge, HTTP 500 on long context |
llama3.2:1b |
1.2B | ~1.3GB | ๐ข Slow | โ Native | 128k context! Thorough but slow |
gemma3:270m |
270M | ~292MB | โกโก Fastest | โ ๏ธ ReAct JSON | Uses JSON ReAct format, Math & Code champion |
qwen3:0.6b |
600M | ~523MB | โก Medium | โ ๏ธ Text | Perfect reasoning but Calc returns empty |
granite4:350m |
350M | ~708MB | โก Fast | โ Refused | Refuses calculator - safety filter |
qwen2.5:0.5b |
494M | ~398MB | โก Fast | โ ๏ธ Text | Reasoning & Knowledge champ, Calc fails |
qwen2-math:1.5b |
1.5B | ~935MB | ๐ข Slow | โ No tools | 4 perfect categories! No tool support |
tinyllama:latest |
1.1B | ~638MB | ๐ข Slow | โ ๏ธ Text | Older model, verbose, unstable |
smollm:135m |
135M | ~92MB | โก Fast | โ None | Smallest - hallucinates math (7ร8=42!) |
functiongemma:270m |
270M | ~301MB | โก Fast | โ Broken | Worst performer - returns empty |
Category Champions
| Category | Champion | Score | Notes |
|---|---|---|---|
| Math | qwen2.5-coder:0.5b, granite3.1-moe:1b, BitNet-b1.58-2B |
3/3 | Also gemma3:270m |
| Reasoning | qwen2.5:0.5b, qwen3:0.6b, qwen2-math |
3/3 | Multiple tied |
| Knowledge | granite3.1-moe:1b, qwen2-math |
3/3 | Multiple tied at 3/3 |
| Calc | qwen2.5-coder:0.5b, llama3.2:1b, BitNet-b1.58-2B |
3/3 | 100% tool usage with ReAct |
| Code | Many models | 3/3 | Code generation is easy for small models! |
Test Categories
| Category | Tests | What it measures |
|---|---|---|
| Math | Multiply, Add, Divide | Basic arithmetic without tools |
| Reasoning | Apples, Sequence, Logic | Multi-step reasoning and deduction |
| Knowledge | Japan, France, Brazil capitals | World knowledge recall |
| Calc | Multiply, Divide, Power | Tool usage with calculator |
| Code | is_even, reverse, max_num | Python function generation |
Recommendations
| Use Case | Recommended Model | Why |
|---|---|---|
| General use | qwen2.5-coder:0.5b-instruct-q4_k_m |
Best all-around, fast, great tool usage |
| Large context | llama3.2:1b |
128k context window - handles long conversations |
| Math tasks | qwen2.5-coder:0.5b or qwen2-math:1.5b |
Perfect math scores |
| Reasoning tasks | qwen2.5:0.5b or qwen3:0.6b |
Perfect reasoning |
| Tool usage | qwen2.5-coder:0.5b |
Most reliable tool calling |
| Fastest inference | gemma3:270m |
270M params, fastest responses |
| No tools needed | qwen2-math:1.5b |
4/5 categories perfect (no Calc) |
| Smallest footprint | smollm:135m |
92MB - but expect hallucinations |
โ ๏ธ Models to Avoid
| Model | Issue |
|---|---|
functiongemma:270m |
Despite the name, terrible at function calling - returns empty or refuses |
smollm:135m |
Hallucinates wrong math (7ร8=42), only 7/15 score |
granite4:350m |
Refuses calculator tools (safety filter) |
Known Issues with Small Models
- Tool calling variations:
granite4:350m: Refuses calculator ("I'm sorry, but I can't assist with that")functiongemma:270m: Asks for clarification instead of using toolsqwen2.5:0.5b,qwen3:0.6b: Returns empty responses on Calc testsqwen2-math:1.5b: HTTP 400 - doesn't support tool calling at all
- Math hallucinations:
smollm:135msays "7ร8=42",tinyllamasays "7ร8=45" - Power operator confusion:
gemma3:270mreads2**10as2*10=20 - Reasoning failures: Some models answer "8" for sequence "2,4,6,8,?" (repeat last)
- Stability issues:
granite3.1-moe:1b: HTTP 500 crashes (server EOF)tinyllama,qwen3:0.6b: HTTP 524 timeouts
- Empty responses:
functiongemma:270mreturns empty strings on most tests
Skills (Agent Skills Specification)
๐ฆ LocalClaw R03 supports the Agent Skills specification for reusable instruction bundles.
Skill Structure
skills/
โโโ my-skill/
โโโ SKILL.md # Required: name, description, instructions
โโโ scripts/ # Optional: executable scripts
โโโ references/ # Optional: additional docs
โโโ assets/ # Optional: templates, images
SKILL.md Format
---
name: calculator
description: Perform mathematical calculations. Use when the user needs to compute expressions.
---
# Calculator Skill
Instructions for the model on how to use this skill...
Using Skills
# Load skills via CLI
localclaw chat --skills skill-creator --tools write_file,shell
# Multiple skills
localclaw chat --skills datetime,web_search --tools calculator
Progressive Disclosure
Skills follow a three-level loading system:
- Metadata (~100 tokens):
name+descriptionloaded at startup - Instructions (<500 lines): Full
SKILL.mdbody loaded when skill triggers - Resources (as needed): Files in
scripts/,references/,assets/loaded on demand
Built-in Skills
| Skill | Description |
|---|---|
skill-creator |
OpenClaw's platform-agnostic skill generator. Creates new skills from user requests. |
datetime |
Date and time utilities for formatting, parsing, and calculations. |
web_search |
Web search capabilities for retrieving information from the internet. |
Orchestrator modes
| Mode | Behaviour |
|---|---|
router |
A small routing LLM picks the best agent for each request |
pipeline |
Agents run sequentially โ each receives the previous agent's output |
parallel |
All agents run concurrently; results are merged with attribution |
Running the examples
# Make sure Ollama is serving and you have a model pulled
ollama pull qwen2.5-coder:0.5b-instruct-q4_k_m
# Or use a remote Ollama instance by editing localclaw/core/ollama_client.py
# Quick test suite (recommended first run)
bash test-quick.sh # Linux/macOS/Colab
test-quick.cmd # Windows
# Full test suite (all 11 examples)
bash test.sh # Linux/macOS/Colab
test.cmd # Windows
# Interactive menu
bash run.sh # Linux/macOS/Colab
run.cmd # Windows
# Run individual examples
python examples/01_basic_agent.py
python examples/02_tool_agent.py
python examples/03_orchestrator.py
python examples/04_comprehensive_test.py
python examples/05_tool_tests.py
python examples/06_interactive_chat.py
python examples/07_model_comparison.py
python examples/08_robust_comparison.py
python examples/09_expanded_benchmark.py
python examples/10_skills_demo.py
python examples/11_skill_creator_test.py
ACP Integration (Agent Control Panel)
๐ฆ LocalClaw R03 supports ACP (Agent Control Panel) for centralized activity tracking, token monitoring, and multi-agent coordination.
What is ACP?
ACP is a monitoring and observability protocol for AI agents. Unlike communication protocols (MCP, A2A), ACP sits alongside your agents and provides:
- Activity Tracking: Real-time monitoring of all agent actions
- Token Management: Context window usage estimation per agent
- Multi-Agent Coordination: Track multiple agents in one session
- STOP/Resume Control: Emergency stop capability
- Session Persistence: State preserved across restarts
Enable ACP
# Run with ACP tracking
localclaw chat --acp --tools shell,read_file,write_file -m qwen2.5-coder:0.5b
# Run single prompt with ACP
localclaw run --acp "What is 2+2?"
Configuration
Set your ACP server URL via environment variables:
# Local ACP
export ACP_URL="http://localhost:8766"
# Remote ACP (cloudflare tunnel)
export ACP_URL="https://your-tunnel.trycloudflare.com"
# Credentials
export ACP_USER="admin"
export ACP_PASS="secret"
Or edit localclaw/config.py for persistent settings.
What Gets Logged
| Activity | Description |
|---|---|
| Bootstrap | Session start, identity establishment |
| User messages | All prompts sent to the model |
| Assistant messages | All model responses |
| Tool calls | Shell commands, file operations, etc. |
| Tool results | Outcomes from tool execution |
Per-Agent Token Tracking
When multiple agents connect to the same ACP session:
{
"primary_agent": "Super Z",
"agent_tokens": {
"Super Z": 42000,
"LocalClaw": 500
},
"other_agents_tokens": 500
}
- First agent to connect becomes primary (owns main context window)
- Other agents tracked separately in
agent_tokens - Prevents context pollution between agents
ACP Server
To run your own ACP server, see the ACP Specification:
# ACP is a single Python file
python VTSTech-GLMACP.py
# With cloudflare tunnel
GLMACP_TUNNEL=auto python VTSTech-GLMACP.py
Remote Ollama Configuration
To use a remote Ollama instance (e.g., via Cloudflare tunnel), set the environment variable:
# Local Ollama (default)
export OLLAMA_URL="http://localhost:11434"
# Remote Ollama (cloudflare tunnel)
export OLLAMA_URL="https://your-tunnel.trycloudflare.com"
Or edit localclaw/config.py for persistent settings.
Timeout Configuration
Configure via environment variables:
# Request timeout in seconds (default: 90s for Cloudflare tunnel compatibility)
export OLLAMA_TIMEOUT=90
# Max retry attempts for transient errors (default: 3)
export OLLAMA_MAX_RETRIES=3
# Initial retry delay in seconds (default: 5s, doubles each retry)
export OLLAMA_RETRY_DELAY=5
Automatic Retry
LocalClaw automatically retries on transient errors with exponential backoff:
| Error Code | Description | Retry Behavior |
|---|---|---|
| HTTP 524 | Cloudflare tunnel timeout | Retries up to 3 times |
| HTTP 502/503/504 | Server temporarily unavailable | Retries up to 3 times |
| HTTP 500 | Server error (model loading, memory pressure) | Retries up to 3 times |
| Timeout | Socket or connection timeout | Retries up to 3 times |
Performance Optimization
CLI Options for Speed
# Fast mode - reduces context and output for quicker responses
localclaw chat -m qwen2.5-coder:0.5b --fast --verbose
# Fine-tuned control
localclaw chat -m qwen2.5-coder:0.5b --num-ctx 2048 --num-predict 128
# Warm up model before chat (useful for remote Ollama with cold starts)
localclaw chat -m qwen2.5-coder:0.5b --warmup --fast
| Option | Description | Speed Impact |
|---|---|---|
--fast |
Preset: num_ctx=2048, num_predict=256 |
๐ Significant |
--num-ctx N |
Reduce context window (default varies by model) | ๐ Significant |
--num-predict N |
Limit max output tokens | โก Moderate |
--warmup |
Pre-load model before first chat | โก Faster first response |
Ollama Model Options
Control model behavior via CLI flags:
# Lower temperature = more deterministic
localclaw chat -m qwen2.5-coder:0.5b --temperature 0.1
# Smaller context = faster
localclaw chat -m qwen2.5-coder:0.5b --num-ctx 2048 --num-predict 128
# Combined for optimal speed
localclaw chat -m qwen2.5-coder:0.5b --fast --temperature 0.3
Remote Ollama Tips
When using a remote Ollama via Cloudflare tunnel:
- Use
--fastflag - Reduces inference time significantly - Use smaller models -
qwen2.5-coder:0.5bis fastest - Warm up the model - First request is slowest due to model loading
- Increase timeout if needed:
export OLLAMA_TIMEOUT=120
# Recommended for remote Ollama
localclaw chat -m qwen2.5-coder:0.5b-instruct-q4_k_m \
--fast --warmup --verbose \
--tools python_repl
Why Inference is Slow
| Factor | Impact | Solution |
|---|---|---|
| Model size | Larger models = slower | Use smaller quantized models |
| Context window | More context = slower | Use --num-ctx 2048 or smaller |
| Output length | More tokens = slower | Use --num-predict 128 |
| Remote connection | Network latency | Use local Ollama if possible |
| Cold start | First load is slowest | Use --warmup flag |
| GPU unavailable | CPU inference is slow | Ensure GPU is configured |
Recent Improvements
R03: BitNet Backend
๐ฆ LocalClaw R03 adds BitNet backend support for running Microsoft's 1.58-bit quantized models:
- New backend: Switch between Ollama and BitNet via
--backendflag - Zero-cost inference: BitNet models run efficiently on CPU
- Setup helper:
bitnet_setup.pyhandles cloning and compilation - Note: BitNet requires ReAct fallback (no native tool support)
R03: Enhanced Security
Built-in tools now have comprehensive security:
- Path validation: Restrict file access to allowed directories
- Command blocklist: Block dangerous commands (
rm,sudo,chmod, etc.) - Pattern detection: Detect dangerous shell patterns (pipes to bash, command substitution)
- SSRF protection: Block private IPs and cloud metadata endpoints in
http_get - Configurable modes:
strict,permissive, ordisabled
# Set security mode
export LOCALCLAW_SECURITY_MODE=strict
export LOCALCLAW_ALLOWED_PATHS=/home/user/projects:/tmp
export LOCALCLAW_BLOCKED_COMMANDS=rm,sudo,dd
Zero Dependencies
๐ฆ LocalClaw R03 continues to use only Python stdlib โ no pip install required! The HTTP client uses urllib instead of httpx.
Automatic Error Recovery
- HTTP 524/502/503/504/500 retry: Transient server errors are automatically retried with exponential backoff
- Timeout retry: Socket timeouts are retried automatically
- Configurable via environment variables:
OLLAMA_TIMEOUT,OLLAMA_MAX_RETRIES,OLLAMA_RETRY_DELAY
Small Model Support
๐ฆ LocalClaw R03 handles quirks of small models (โค1.5B parameters):
- Fuzzy tool name matching: Hallucinated tool names like
calculate_expressionare automatically mapped tocalculator - Argument auto-fixing: Common wrong argument patterns are corrected (e.g.,
{"base": 2, "exponent": 10}โ{"expression": "2 ** 10"}) - JSON response cleaning: When models output tool schemas instead of text answers, LocalClaw falls back to tool results
- Unicode normalization: Accented characters are normalized for comparison (e.g., "Brasรญlia" matches "brasilia")
- ReAct text parsing: Models without native tool support automatically fall back to text-based ReAct format
Optimized Test Prompts
Key insights for small model prompt engineering:
- State the fact first: "The capital of Japan is Tokyo. What is the capital of Japan?"
- Show the answer format: "Answer: Tokyo" at the end
- Give calculation steps: "10 minus 3 equals 7. Then 7 minus 2 equals 5."
- Be explicit with tools: "Use calculator tool. Expression: 2 ** 10. Result: 1024"
- Guide code output: "Start with: def is_even(n):"
New Examples
| Example | Description |
|---|---|
07_model_comparison.py |
Benchmark 15 tests across models with category breakdown |
08_robust_comparison.py |
Progress-saving comparison for unstable connections |
09_expanded_benchmark.py |
25 tests across 8 categories including tool chaining |
10_skills_demo.py |
Demonstrate Agent Skills system with skill-creator |
11_skill_creator_test.py |
Benchmark skill creation across multiple small models |
Test Categories (15 tests)
| Category | Tests | Description |
|---|---|---|
| Math | Multiply, Add, Divide | Basic arithmetic (no tools) |
| Reasoning | Apples, Sequence, Logic | Multi-step reasoning |
| Knowledge | Japan, France, Brazil | World knowledge |
| Calc | Multiply, Divide, Power | Calculator tool usage |
| Code | is_even, reverse, max_num | Python code generation |
BitNet Benchmark Results
LocalClaw R03 has been tested with Microsoft BitNet-b1.58-2B-4T โ a 2B parameter model with 1.58-bit ternary weights, designed for efficient CPU inference.
Test Results Summary
| Test Suite | Score | Time | Notes |
|---|---|---|---|
| Model Comparison (15 tests) | 13/15 (87%) | 394s | 5 categories |
| Robust Comparison (22 tests) | 19/22 (86%) | ~6min | Incremental save |
| Comprehensive Test (7 tests) | 6/7 (86%) | ~90s | Basic + Reasoning + Code |
Category Breakdown (Model Comparison - 15 tests)
| Category | Score | Pass Rate |
|---|---|---|
| Math | 3/3 | 100% โ |
| Code | 3/3 | 100% โ |
| Calc (with tools) | 3/3 | 100% โ |
| Reasoning | 2/3 | 67% |
| Knowledge | 2/3 | 67% |
| Total | 13/15 | 87% |
Failed Tests
| Test | Expected | Got | Category |
|---|---|---|---|
| Apples (reasoning) | 5 | 7 | Reasoning |
| Brazil capital | Brasรญlia | Sรฃo Paulo | Knowledge |
Performance Notes
| Metric | Value |
|---|---|
| Avg response time | 5-10s (simple), 100s+ (tool use) |
| Tool calling | ReAct fallback (no native support) |
| Context window | Default (model dependent) |
| Inference | CPU-efficient ternary weights |
BitNet vs Ollama Small Models
| Rank | Model | Score | Params | Backend |
|---|---|---|---|---|
| ๐ฅ | qwen2.5-coder:0.5b-instruct-q4_k_m |
14/15 (93%) | 494M | Ollama |
| ๐ฅ | BitNet-b1.58-2B-4T |
13/15 (87%) | 2B | BitNet |
| ๐ฅ | granite3.1-moe:1b |
12/15 (80%) | 1B MoE | Ollama |
| 4 | llama3.2:1b |
12/15 (80%) | 1.2B | Ollama |
Note: BitNet uses 1.58-bit ternary weights, making it highly efficient for CPU inference despite having 2B parameters.
BitNet Setup for Benchmarking
# 1. Clone and compile BitNet
python localclaw/bitnet_setup.py
# 2. Start the BitNet server
./build/bin/llama-server -m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf
# 3. Run benchmark
export LOCALCLAW_BACKEND=bitnet
python examples/07_model_comparison.py
# 4. Run with ACP tracking
export LOCALCLAW_BACKEND=bitnet
python examples/07_model_comparison_acp.py
Observations
- Excellent for CPU-only systems โ ternary weights enable fast inference without GPU
- Solid tool usage โ ReAct fallback handles calculator tools reliably
- Code generation strong โ 100% pass rate on function writing tasks
- Multi-step reasoning challenges โ the "apples" test requires tracking state
- Knowledge gaps โ Sรฃo Paulo is commonly mistaken for Brazil's capital
About
๐ฆ LocalClaw R03 is written and maintained by VTSTech.
- ๐ Website: https://www.vts-tech.org
- ๐ฆ GitHub: https://github.com/VTSTech/LocalClaw
- ๐ป More projects: https://github.com/VTSTech
Testing Status: LocalClaw has been tested with both Ollama (11 small models) and BitNet (BitNet-b1.58-2B-4T) backends. BitNet achieved 87% on the benchmark, making it the 2nd best performer overall. See Tested Small Models and BitNet Benchmark Results sections for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file localclaw-0.3.0.1.tar.gz.
File metadata
- Download URL: localclaw-0.3.0.1.tar.gz
- Upload date:
- Size: 158.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cc531a707ad60a705e0572d7f1e1abbb38a67f1472d34c4b2912157fb27c0763
|
|
| MD5 |
1da0b07362d774e304c90c6901009524
|
|
| BLAKE2b-256 |
f292f60ec78645f110fe4b7fd86d01092001712465825b7a082f652a6c9fb604
|
File details
Details for the file localclaw-0.3.0.1-py3-none-any.whl.
File metadata
- Download URL: localclaw-0.3.0.1-py3-none-any.whl
- Upload date:
- Size: 147.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c38eb66b429e48ada738305f531a135f42205eaadf80b2691ae1e1ec197aa1b2
|
|
| MD5 |
1cdf746a384ef770b5f8de193a434f76
|
|
| BLAKE2b-256 |
d6090db51ea966de947dfb5cb771096c001b54a13247aad0fe515607ee611285
|