A minimal, hackable agentic framework for Ollama and BitNet - local-first AI agent toolkit

These details have not been verified by PyPI

Project links

Project description

🦞 LocalClaw R03

A minimal, hackable agentic framework engineered to run entirely locally with Ollama or BitNet.

Inspired by the architecture of OpenClaw, rebuilt from scratch for local-first operation.

Architecture

localclaw/
├── core/
│   ├── ollama_client.py   # Zero-dependency HTTP wrapper (stdlib urllib only)
│   ├── tools.py           # Decorator-based tool registry + JSON schema generation
│   ├── memory.py          # Sliding-window conversation memory with summarization
│   ├── agent.py           # ReAct loop — native tool-call + text-fallback modes
│   └── orchestrator.py    # Multi-agent routing (router / pipeline / parallel)
├── skills/
│   ├── loader.py          # Agent Skills specification loader (progressive disclosure)
│   ├── skill-creator/     # OpenClaw skill-creator for generating new skills
│   ├── acp/               # ACP (Agent Control Panel) skill
│   ├── datetime/          # Datetime utilities skill
│   └── web_search/        # Web search skill
├── tools/
│   └── builtins.py        # Ready-to-use tools: calculator, shell, file I/O, HTTP, REPL
├── bitnet_client.py       # R03: BitNet backend client (Microsoft 1.58-bit quantization)
├── bitnet_setup.py        # R03: BitNet setup/compilation helper
├── acp_plugin.py          # ACP integration for activity tracking and A2A messaging
├── model_discovery.py     # R03: Dynamic model discovery for both backends
└── examples/
    ├── 01_basic_agent.py           # Simple Q&A demo
    ├── 02_tool_agent.py            # Tool calling demo
    ├── 03_orchestrator.py          # Multi-agent routing demo
    ├── 04_comprehensive_test.py    # Full test suite (supports BitNet)
    ├── 04_comprehensive_test_acp.py # ACP-tracked version
    ├── 05_tool_tests.py            # Tool-specific tests
    ├── 06_interactive_chat.py      # Interactive CLI chat
    ├── 07_model_comparison.py      # Compare models on 15 tests (3 per category)
    ├── 07_model_comparison_acp.py  # ACP-tracked version with model logging
    ├── 08_robust_comparison.py     # Progress-saving comparison for unstable connections
    ├── 08_robust_comparison_acp.py # ACP-tracked version with resumability
    ├── 09_expanded_benchmark.py    # 25 tests across 8 categories
    ├── 10_skills_demo.py           # Agent Skills system demo
    └── 11_skill_creator_test.py    # Skill creation benchmark across models

Test Scripts

test.sh          # Bash: Run all 11 examples (Linux/macOS/Colab)
test-quick.sh    # Bash: Run 7 quick tests (skips benchmarks)
run.sh           # Bash: Interactive menu for single example
test-bitnet.sh   # Bash: Run BitNet benchmark tests
test.cmd         # Batch: Run all 11 examples (Windows)
test-quick.cmd   # Batch: Run 7 quick tests (Windows)
run.cmd          # Batch: Interactive menu for single example (Windows)
test-bitnet.cmd  # Batch: Run BitNet benchmark tests (Windows)

Core design decisions

Concern	Approach
HTTP Client	Zero external dependencies — uses Python stdlib `urllib` only
Backends	Ollama (default) or BitNet (R03) — switch via `--backend` flag
Tool calling	Native Ollama tool-call protocol when supported; automatic ReAct text-parsing fallback for other models
Memory	Sliding window — older turns are archived and optionally compressed via LLM summarization
Tools	Decorator-based, auto-generates JSON schemas from Python type hints
Orchestration	Router (LLM picks agent), Pipeline (chain), or Parallel (concurrent + merge)
Streaming	First-class via generator interface
Error handling	Automatic retry with exponential backoff for transient network/server errors
Security	Path validation, command blocklist, SSRF protection (R03)

Installation

From PyPI (Recommended)

pip install localclaw

# Or install from GitHub for the latest development version:
pip install git+https://github.com/VTSTech/LocalClaw.git

From Source

# Clone the repository
git clone https://github.com/VTSTech/LocalClaw.git
cd LocalClaw

# Install in development mode
pip install -e .

No Installation Required

LocalClaw uses only Python stdlib — no dependencies! You can also just copy the localclaw directory into your project:

# Just copy and use
cp -r localclaw /path/to/your/project/

Setup Ollama

# Make sure Ollama is running:
ollama serve

# Pull a model:
ollama pull qwen2.5-coder:0.5b-instruct-q4_k_m

Usage After Installation

# Use the CLI command
localclaw chat --model llama3.1:8b

# Or use as a module
python -m localclaw chat --model llama3.1:8b

# Or in Python code
from localclaw import Agent
agent = Agent(model="llama3.1:8b")

BitNet Backend (R03)

LocalClaw supports Microsoft's BitNet for 1.58-bit ternary weight models — highly efficient CPU inference.

Supported Models

Model	Size	HuggingFace Repo
BitNet-b1.58-2B-4T	~0.4 GB	`microsoft/BitNet-b1.58-2B-4T`
Falcon3-1B-Instruct	~1 GB	`tiiuae/Falcon3-1B-Instruct-1.58bit`
Falcon3-3B-Instruct	~3 GB	`tiiuae/Falcon3-3B-Instruct-1.58bit`
Falcon3-7B-Instruct	~7 GB	`tiiuae/Falcon3-7B-Instruct-1.58bit`
Falcon3-10B-Instruct	~10 GB	`tiiuae/Falcon3-10B-Instruct-1.58bit`

Setup (One Command with huggingface-cli)

BitNet's setup_env.py handles everything: download, convert to GGUF, quantize, and compile kernels.

# Clone BitNet
git clone --recursive https://github.com/microsoft/BitNet.git
cd BitNet
pip install -r requirements.txt

# Download, convert, and prepare a model (choose one):
python setup_env.py --hf-repo microsoft/BitNet-b1.58-2B-4T -q i2_s      # Recommended
python setup_env.py --hf-repo tiiuae/Falcon3-1B-Instruct-1.58bit -q i2_s  # Smallest Falcon
python setup_env.py --hf-repo tiiuae/Falcon3-3B-Instruct-1.58bit -q i2_s  # Best balance
python setup_env.py --hf-repo tiiuae/Falcon3-7B-Instruct-1.58bit -q i2_s  # Most capable

This automatically:

Downloads the model from HuggingFace (safetensors format)
Converts to GGUF format
Quantizes to i2_s (1.58-bit ternary)
Compiles optimized CPU kernels

Manual Download (wget)

If you prefer not to use huggingface-cli, download directly with wget:

# Create model directory
mkdir -p models/Falcon3-1B-Instruct-1.58bit
cd models/Falcon3-1B-Instruct-1.58bit

# Download model files (~1.3GB for 1B, ~3.2GB for 3B, ~7.5GB for 7B)
wget https://huggingface.co/tiiuae/Falcon3-1B-Instruct-1.58bit/resolve/main/model.safetensors
wget https://huggingface.co/tiiuae/Falcon3-1B-Instruct-1.58bit/resolve/main/config.json
wget https://huggingface.co/tiiuae/Falcon3-1B-Instruct-1.58bit/resolve/main/tokenizer.json
wget https://huggingface.co/tiiuae/Falcon3-1B-Instruct-1.58bit/resolve/main/tokenizer_config.json
wget https://huggingface.co/tiiuae/Falcon3-1B-Instruct-1.58bit/resolve/main/special_tokens_map.json
wget https://huggingface.co/tiiuae/Falcon3-1B-Instruct-1.58bit/resolve/main/generation_config.json

# Or for BitNet-b1.58-2B-4T (~400MB):
mkdir -p models/BitNet-b1.58-2B-4T
cd models/BitNet-b1.58-2B-4T
wget https://huggingface.co/microsoft/BitNet-b1.58-2B-4T/resolve/main/model.safetensors
wget https://huggingface.co/microsoft/BitNet-b1.58-2B-4T/resolve/main/config.json
wget https://huggingface.co/microsoft/BitNet-b1.58-2B-4T/resolve/main/tokenizer.json
wget https://huggingface.co/microsoft/BitNet-b1.58-2B-4T/resolve/main/tokenizer_config.json

Then run setup_env.py pointing to your downloaded model:

cd ../..  # Back to BitNet root
python setup_env.py --model-dir models/Falcon3-1B-Instruct-1.58bit -q i2_s

Model File Sizes

Model	model.safetensors	Total Download
Falcon3-1B-Instruct	~1.3 GB	~1.4 GB
Falcon3-3B-Instruct	~3.2 GB	~3.4 GB
Falcon3-7B-Instruct	~7.5 GB	~7.8 GB
BitNet-b1.58-2B-4T	~400 MB	~500 MB

Start the Server

# Start BitNet server (separate terminal)
./build/bin/llama-server -m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf

# Or for Falcon models:
./build/bin/llama-server -m models/Falcon3-1B-Instruct-1.58bit/ggml-model-i2_s.gguf

Use with LocalClaw

# Set BitNet URL (default: http://localhost:8080)
export BITNET_BASE_URL=http://localhost:8080

# Chat with BitNet backend
localclaw chat --backend bitnet --force-react

# With tools
localclaw chat --backend bitnet --force-react --tools calculator,shell

Note: BitNet models require --force-react as they don't support native tool calling.

Colab Quick Start

# Cell 1: Setup BitNet with Falcon3-1B (fastest option)
!git clone --recursive https://github.com/microsoft/BitNet.git
%cd BitNet
!pip install -r requirements.txt
!python setup_env.py --hf-repo tiiuae/Falcon3-1B-Instruct-1.58bit -q i2_s

# Cell 2: Start server in background
import subprocess, time
server = subprocess.Popen(
    ['./build/bin/llama-server', '-m', 'models/Falcon3-1B-Instruct-1.58bit/ggml-model-i2_s.gguf', '--port', '8080'],
    stdout=subprocess.PIPE, stderr=subprocess.PIPE
)
time.sleep(5)  # Wait for server startup

# Cell 3: Clone and run LocalClaw
%cd /content
!git clone https://github.com/VTSTech/LocalClaw.git
%cd LocalClaw
!localclaw chat --backend bitnet --force-react

Model Comparison

Model	Speed	Quality	Best For
BitNet-b1.58-2B-4T	⚡⚡⚡	Good	Quick tasks, testing
Falcon3-1B-Instruct	⚡⚡⚡	Good	Fastest inference
Falcon3-3B-Instruct	⚡⚡	Better	Balanced performance
Falcon3-7B-Instruct	⚡	Best	Complex reasoning

BitNet Benchmark Results: BitNet-b1.58-2B-4T achieved 87% on the LocalClaw benchmark — see BitNet Benchmark Results section below.

Quick start

1. Single prompt

# Simple Q&A
localclaw run "What is the capital of Japan?"

# With streaming output
localclaw run "Tell me a joke." --stream

# Specify a model
localclaw run "Explain quantum computing" -m llama3.2:3b

2. Interactive chat

# Start interactive session
localclaw chat -m qwen2.5-coder:0.5b

# With tools enabled
localclaw chat -m llama3.1:8b --tools calculator,shell,read_file,write_file

# With skills loaded
localclaw chat -m llama3.2:3b --skills skill-creator --tools write_file,shell

# Fast mode (reduced context for speed)
localclaw chat -m qwen2.5-coder:0.5b --fast --verbose

3. Using BitNet backend

# BitNet requires --force-react for tool support
localclaw chat --backend bitnet --force-react

# Run single prompt with BitNet
localclaw run "Calculate 17 * 23" --backend bitnet --tools calculator

4. With ACP tracking

# Enable ACP for activity monitoring
localclaw chat -m qwen2.5-coder:0.5b --acp --tools shell,read_file,write_file

# Single prompt with ACP
localclaw run "What is 2+2?" --acp

CLI Commands

Command	Description
`run "prompt"`	Run single prompt and exit
`chat`	Interactive multi-turn conversation
`models`	List available Ollama models
`tools`	List built-in tools
`skills`	List available Agent Skills

CLI Flags

Flag	Description
`-m`, `--model`	Model name (default: qwen2.5-coder:0.5b)
`--tools`	Comma-separated tool list
`--skills`	Comma-separated skill list
`--backend`	`ollama` or `bitnet`
`--force-react`	Force ReAct text parsing
`--acp`	Enable ACP integration
`-v`, `--verbose`	Show tool calls and timing
`--debug`	Show detailed debug info
`--fast`	Preset: reduced context for speed
`--warmup`	Pre-load model before chat
`--stream`	Stream output token-by-token
`--temperature`	Sampling temperature (0.0-2.0)
`--num-ctx`	Context window size
`--num-predict`	Max output tokens

Interactive Commands (in chat)

Command	Description
`/help`	Show available commands
`/status`	Show session status
`/tools`	List active tools
`/skills`	List active skills
`/reset`	Clear conversation history
`/undo`	Remove last exchange
`/retry`	Retry last message
`/a2a`	Process pending A2A messages
`/export`	Export to markdown
`exit`	End session

Built-in Tools

Tool	Description
`calculator`	Evaluate math expressions
`python_repl`	Execute Python code
`shell`	Run shell commands
`read_file`	Read file contents
`write_file`	Write content to file
`list_directory`	List directory contents
`http_get`	HTTP GET request
`save_note`	Save a note to memory
`get_note`	Retrieve saved notes

# List all tools
localclaw tools

# Use specific tools
localclaw chat --tools calculator,python_repl,shell

Built-in Skills

Skill	Description
`skill-creator`	Generate new Agent Skills from requests
`datetime`	Date/time formatting and calculations
`web_search`	Web search capabilities

# List all skills
localclaw skills

# Use skills in chat
localclaw chat --skills skill-creator --tools write_file

Supported models (tool-calling)

The following model families support native tool calling in Ollama and are auto-detected:

Meta Llama: llama3, llama3.1, llama3.2, llama3.3, llama3-groq-tool-use

Mistral AI: mistral, mixtral, mistral-nemo, mistral-small, mistral-large, codestral, ministral

Alibaba Qwen: qwen2, qwen2.5, qwen3, qwen35, qwen2.5-coder, qwen2-math

Cohere: command-r, command-r7b

DeepSeek: deepseek, deepseek-coder, deepseek-v2, deepseek-v3

Microsoft Phi: phi-3, phi3, phi-4

Google Gemma: functiongemma (designed for function calling)

Others: yi-, yi1.5, internlm2, internlm2.5, solar, glm4, chatglm, firefunction, hermes, nemotron, cogito, athene

All other models fall back to ReAct text-parsing automatically.

Tested Small Models (≤1.5B parameters)

The following models have been tested with a 15-test benchmark (3 tests per category: Math, Reasoning, Knowledge, Calc Tool, Code). Prompts are optimized for small model comprehension.

Rankings (Updated)

Rank	Model	Score	Time	Math	Reason	Know	Calc	Code
🥇	`qwen2.5-coder:0.5b-instruct-q4_k_m`	14/15 (93%)	~80s	3/3	2/3	2/3	3/3	3/3
🥈	`BitNet-b1.58-2B-4T` (BitNet)	13/15 (87%)	~394s	3/3	2/3	2/3	3/3	3/3
🥉	`granite3.1-moe:1b`	12/15 (80%)	~60s	3/3	2/3	3/3	1/3	3/3
4	`llama3.2:1b`	12/15 (80%)	~600s	3/3	1/3	2/3	3/3	3/3
5	`gemma3:270m`	10/15 (67%)	~75s	3/3	1/3	1/3	2/3	3/3
6	`qwen3:0.6b`	~9/12	~130s	2/3	3/3	3/3	0/3	—
7	`granite4:350m`	8/15 (53%)	~97s	2/3	1/3	2/3	0/3	3/3
8	`qwen2.5:0.5b`	10/15 (67%)	~107s	1/3	3/3	3/3	0/3	3/3
9	`qwen2-math:1.5b`	12/15 (80%)	~611s	3/3	3/3	3/3	❌	3/3
10	`tinyllama:latest`	9/15 (60%)	~587s	2/3	2/3	3/3	0/3	2/3
11	`smollm:135m`	7/15 (47%)	~285s	0/3	2/3	2/3	0/3	3/3
12	`functiongemma:270m`	1/15 (7%)	~90s	0/3	0/3	0/3	0/3	1/3

Note: Scores vary between runs due to model non-determinism. The qwen2.5-coder:0.5b achieved 100% in some runs.

Model Details

Model	Params	Size	Speed	Tool Support	Notes
`qwen2.5-coder:0.5b`	494M	~400MB	⚡ Fast	✅ Native	🏆 Best overall! Excellent tool usage
`BitNet-b1.58-2B-4T`	2B	~1.3GB	⚡ Medium	⚠️ ReAct	🥈 2nd place! CPU-efficient ternary weights
`granite3.1-moe:1b`	1B MoE	~1.4GB	⚡ Medium	✅ Native	Strong knowledge, HTTP 500 on long context
`llama3.2:1b`	1.2B	~1.3GB	🐢 Slow	✅ Native	128k context! Thorough but slow
`gemma3:270m`	270M	~292MB	⚡⚡ Fastest	⚠️ ReAct JSON	Uses JSON ReAct format, Math & Code champion
`qwen3:0.6b`	600M	~523MB	⚡ Medium	⚠️ Text	Perfect reasoning but Calc returns empty
`granite4:350m`	350M	~708MB	⚡ Fast	❌ Refused	Refuses calculator - safety filter
`qwen2.5:0.5b`	494M	~398MB	⚡ Fast	⚠️ Text	Reasoning & Knowledge champ, Calc fails
`qwen2-math:1.5b`	1.5B	~935MB	🐢 Slow	❌ No tools	4 perfect categories! No tool support
`tinyllama:latest`	1.1B	~638MB	🐢 Slow	⚠️ Text	Older model, verbose, unstable
`smollm:135m`	135M	~92MB	⚡ Fast	❌ None	Smallest - hallucinates math (7×8=42!)
`functiongemma:270m`	270M	~301MB	⚡ Fast	❌ Broken	Worst performer - returns empty

Category Champions

Category	Champion	Score	Notes
Math	`qwen2.5-coder:0.5b`, `granite3.1-moe:1b`, `BitNet-b1.58-2B`	3/3	Also gemma3:270m
Reasoning	`qwen2.5:0.5b`, `qwen3:0.6b`, `qwen2-math`	3/3	Multiple tied
Knowledge	`granite3.1-moe:1b`, `qwen2-math`	3/3	Multiple tied at 3/3
Calc	`qwen2.5-coder:0.5b`, `llama3.2:1b`, `BitNet-b1.58-2B`	3/3	100% tool usage with ReAct
Code	Many models	3/3	Code generation is easy for small models!

Test Categories

Category	Tests	What it measures
Math	Multiply, Add, Divide	Basic arithmetic without tools
Reasoning	Apples, Sequence, Logic	Multi-step reasoning and deduction
Knowledge	Japan, France, Brazil capitals	World knowledge recall
Calc	Multiply, Divide, Power	Tool usage with calculator
Code	is_even, reverse, max_num	Python function generation

Recommendations

Use Case	Recommended Model	Why
General use	`qwen2.5-coder:0.5b-instruct-q4_k_m`	Best all-around, fast, great tool usage
Large context	`llama3.2:1b`	128k context window - handles long conversations
Math tasks	`qwen2.5-coder:0.5b` or `qwen2-math:1.5b`	Perfect math scores
Reasoning tasks	`qwen2.5:0.5b` or `qwen3:0.6b`	Perfect reasoning
Tool usage	`qwen2.5-coder:0.5b`	Most reliable tool calling
Fastest inference	`gemma3:270m`	270M params, fastest responses
No tools needed	`qwen2-math:1.5b`	4/5 categories perfect (no Calc)
Smallest footprint	`smollm:135m`	92MB - but expect hallucinations

⚠️ Models to Avoid

Model	Issue
`functiongemma:270m`	Despite the name, terrible at function calling - returns empty or refuses
`smollm:135m`	Hallucinates wrong math (7×8=42), only 7/15 score
`granite4:350m`	Refuses calculator tools (safety filter)

Known Issues with Small Models

Tool calling variations:
- granite4:350m: Refuses calculator ("I'm sorry, but I can't assist with that")
- functiongemma:270m: Asks for clarification instead of using tools
- qwen2.5:0.5b, qwen3:0.6b: Returns empty responses on Calc tests
- qwen2-math:1.5b: HTTP 400 - doesn't support tool calling at all
Math hallucinations: smollm:135m says "7×8=42", tinyllama says "7×8=45"
Power operator confusion: gemma3:270m reads 2**10 as 2*10=20
Reasoning failures: Some models answer "8" for sequence "2,4,6,8,?" (repeat last)
Stability issues:
- granite3.1-moe:1b: HTTP 500 crashes (server EOF)
- tinyllama, qwen3:0.6b: HTTP 524 timeouts
Empty responses: functiongemma:270m returns empty strings on most tests

Skills (Agent Skills Specification)

🦞 LocalClaw R03 supports the Agent Skills specification for reusable instruction bundles.

Skill Structure

skills/
└── my-skill/
    ├── SKILL.md          # Required: name, description, instructions
    ├── scripts/          # Optional: executable scripts
    ├── references/       # Optional: additional docs
    └── assets/           # Optional: templates, images

SKILL.md Format

---
name: calculator
description: Perform mathematical calculations. Use when the user needs to compute expressions.
---

# Calculator Skill

Instructions for the model on how to use this skill...

Using Skills

# Load skills via CLI
localclaw chat --skills skill-creator --tools write_file,shell

# Multiple skills
localclaw chat --skills datetime,web_search --tools calculator

Progressive Disclosure

Skills follow a three-level loading system:

Metadata (~100 tokens): name + description loaded at startup
Instructions (<500 lines): Full SKILL.md body loaded when skill triggers
Resources (as needed): Files in scripts/, references/, assets/ loaded on demand

Built-in Skills

Skill	Description
`skill-creator`	OpenClaw's platform-agnostic skill generator. Creates new skills from user requests.
`datetime`	Date and time utilities for formatting, parsing, and calculations.
`web_search`	Web search capabilities for retrieving information from the internet.

Orchestrator modes

Mode	Behaviour
`router`	A small routing LLM picks the best agent for each request
`pipeline`	Agents run sequentially — each receives the previous agent's output
`parallel`	All agents run concurrently; results are merged with attribution

Running the examples

# Make sure Ollama is serving and you have a model pulled
ollama pull qwen2.5-coder:0.5b-instruct-q4_k_m

# Or use a remote Ollama instance by editing localclaw/core/ollama_client.py

# Quick test suite (recommended first run)
bash test-quick.sh      # Linux/macOS/Colab
test-quick.cmd          # Windows

# Full test suite (all 11 examples)
bash test.sh            # Linux/macOS/Colab
test.cmd                # Windows

# Interactive menu
bash run.sh             # Linux/macOS/Colab
run.cmd                 # Windows

# Run individual examples
python examples/01_basic_agent.py
python examples/02_tool_agent.py
python examples/03_orchestrator.py
python examples/04_comprehensive_test.py
python examples/05_tool_tests.py
python examples/06_interactive_chat.py
python examples/07_model_comparison.py
python examples/08_robust_comparison.py
python examples/09_expanded_benchmark.py
python examples/10_skills_demo.py
python examples/11_skill_creator_test.py

ACP Integration (Agent Control Panel)

🦞 LocalClaw R03 supports ACP (Agent Control Panel) for centralized activity tracking, token monitoring, and multi-agent coordination.

What is ACP?

ACP is a monitoring and observability protocol for AI agents. Unlike communication protocols (MCP, A2A), ACP sits alongside your agents and provides:

Activity Tracking: Real-time monitoring of all agent actions
Token Management: Context window usage estimation per agent
Multi-Agent Coordination: Track multiple agents in one session
STOP/Resume Control: Emergency stop capability
Session Persistence: State preserved across restarts

Enable ACP

# Run with ACP tracking
localclaw chat --acp --tools shell,read_file,write_file -m qwen2.5-coder:0.5b

# Run single prompt with ACP
localclaw run --acp "What is 2+2?"

Configuration

Set your ACP server URL via environment variables:

# Local ACP
export ACP_URL="http://localhost:8766"

# Remote ACP (cloudflare tunnel)
export ACP_URL="https://your-tunnel.trycloudflare.com"

# Credentials
export ACP_USER="admin"
export ACP_PASS="secret"

Or edit localclaw/config.py for persistent settings.

What Gets Logged

Activity	Description
Bootstrap	Session start, identity establishment
User messages	All prompts sent to the model
Assistant messages	All model responses
Tool calls	Shell commands, file operations, etc.
Tool results	Outcomes from tool execution

Per-Agent Token Tracking

When multiple agents connect to the same ACP session:

{
  "primary_agent": "Super Z",
  "agent_tokens": {
    "Super Z": 42000,
    "LocalClaw": 500
  },
  "other_agents_tokens": 500
}

First agent to connect becomes primary (owns main context window)
Other agents tracked separately in agent_tokens
Prevents context pollution between agents

ACP Server

To run your own ACP server, see the ACP Specification:

# ACP is a single Python file
python VTSTech-GLMACP.py

# With cloudflare tunnel
GLMACP_TUNNEL=auto python VTSTech-GLMACP.py

Remote Ollama Configuration

To use a remote Ollama instance (e.g., via Cloudflare tunnel), set the environment variable:

# Local Ollama (default)
export OLLAMA_URL="http://localhost:11434"

# Remote Ollama (cloudflare tunnel)
export OLLAMA_URL="https://your-tunnel.trycloudflare.com"

Or edit localclaw/config.py for persistent settings.

Timeout Configuration

Configure via environment variables:

# Request timeout in seconds (default: 90s for Cloudflare tunnel compatibility)
export OLLAMA_TIMEOUT=90

# Max retry attempts for transient errors (default: 3)
export OLLAMA_MAX_RETRIES=3

# Initial retry delay in seconds (default: 5s, doubles each retry)
export OLLAMA_RETRY_DELAY=5

Automatic Retry

LocalClaw automatically retries on transient errors with exponential backoff:

Error Code	Description	Retry Behavior
HTTP 524	Cloudflare tunnel timeout	Retries up to 3 times
HTTP 502/503/504	Server temporarily unavailable	Retries up to 3 times
HTTP 500	Server error (model loading, memory pressure)	Retries up to 3 times
Timeout	Socket or connection timeout	Retries up to 3 times

Performance Optimization

CLI Options for Speed

# Fast mode - reduces context and output for quicker responses
localclaw chat -m qwen2.5-coder:0.5b --fast --verbose

# Fine-tuned control
localclaw chat -m qwen2.5-coder:0.5b --num-ctx 2048 --num-predict 128

# Warm up model before chat (useful for remote Ollama with cold starts)
localclaw chat -m qwen2.5-coder:0.5b --warmup --fast

Option	Description	Speed Impact
`--fast`	Preset: `num_ctx=2048`, `num_predict=256`	🚀 Significant
`--num-ctx N`	Reduce context window (default varies by model)	🚀 Significant
`--num-predict N`	Limit max output tokens	⚡ Moderate
`--warmup`	Pre-load model before first chat	⚡ Faster first response

Ollama Model Options

Control model behavior via CLI flags:

# Lower temperature = more deterministic
localclaw chat -m qwen2.5-coder:0.5b --temperature 0.1

# Smaller context = faster
localclaw chat -m qwen2.5-coder:0.5b --num-ctx 2048 --num-predict 128

# Combined for optimal speed
localclaw chat -m qwen2.5-coder:0.5b --fast --temperature 0.3

Remote Ollama Tips

When using a remote Ollama via Cloudflare tunnel:

Use --fast flag - Reduces inference time significantly
Use smaller models - qwen2.5-coder:0.5b is fastest
Warm up the model - First request is slowest due to model loading
Increase timeout if needed: export OLLAMA_TIMEOUT=120

# Recommended for remote Ollama
localclaw chat -m qwen2.5-coder:0.5b-instruct-q4_k_m \
    --fast --warmup --verbose \
    --tools python_repl

Why Inference is Slow

Factor	Impact	Solution
Model size	Larger models = slower	Use smaller quantized models
Context window	More context = slower	Use `--num-ctx 2048` or smaller
Output length	More tokens = slower	Use `--num-predict 128`
Remote connection	Network latency	Use local Ollama if possible
Cold start	First load is slowest	Use `--warmup` flag
GPU unavailable	CPU inference is slow	Ensure GPU is configured

Recent Improvements

R03: BitNet Backend

🦞 LocalClaw R03 adds BitNet backend support for running Microsoft's 1.58-bit quantized models:

New backend: Switch between Ollama and BitNet via --backend flag
Zero-cost inference: BitNet models run efficiently on CPU
Setup helper: bitnet_setup.py handles cloning and compilation
Note: BitNet requires ReAct fallback (no native tool support)

R03: Enhanced Security

Built-in tools now have comprehensive security:

Path validation: Restrict file access to allowed directories
Command blocklist: Block dangerous commands (rm, sudo, chmod, etc.)
Pattern detection: Detect dangerous shell patterns (pipes to bash, command substitution)
SSRF protection: Block private IPs and cloud metadata endpoints in http_get
Configurable modes: strict, permissive, or disabled

# Set security mode
export LOCALCLAW_SECURITY_MODE=strict
export LOCALCLAW_ALLOWED_PATHS=/home/user/projects:/tmp
export LOCALCLAW_BLOCKED_COMMANDS=rm,sudo,dd

Zero Dependencies

🦞 LocalClaw R03 continues to use only Python stdlib — no pip install required! The HTTP client uses urllib instead of httpx.

Automatic Error Recovery

HTTP 524/502/503/504/500 retry: Transient server errors are automatically retried with exponential backoff
Timeout retry: Socket timeouts are retried automatically
Configurable via environment variables: OLLAMA_TIMEOUT, OLLAMA_MAX_RETRIES, OLLAMA_RETRY_DELAY

Small Model Support

🦞 LocalClaw R03 handles quirks of small models (≤1.5B parameters):

Fuzzy tool name matching: Hallucinated tool names like calculate_expression are automatically mapped to calculator
Argument auto-fixing: Common wrong argument patterns are corrected (e.g., {"base": 2, "exponent": 10} → {"expression": "2 ** 10"})
JSON response cleaning: When models output tool schemas instead of text answers, LocalClaw falls back to tool results
Unicode normalization: Accented characters are normalized for comparison (e.g., "Brasília" matches "brasilia")
ReAct text parsing: Models without native tool support automatically fall back to text-based ReAct format

Optimized Test Prompts

Key insights for small model prompt engineering:

State the fact first: "The capital of Japan is Tokyo. What is the capital of Japan?"
Show the answer format: "Answer: Tokyo" at the end
Give calculation steps: "10 minus 3 equals 7. Then 7 minus 2 equals 5."
Be explicit with tools: "Use calculator tool. Expression: 2 ** 10. Result: 1024"
Guide code output: "Start with: def is_even(n):"

New Examples

Example	Description
`07_model_comparison.py`	Benchmark 15 tests across models with category breakdown
`08_robust_comparison.py`	Progress-saving comparison for unstable connections
`09_expanded_benchmark.py`	25 tests across 8 categories including tool chaining
`10_skills_demo.py`	Demonstrate Agent Skills system with skill-creator
`11_skill_creator_test.py`	Benchmark skill creation across multiple small models

Test Categories (15 tests)

Category	Tests	Description
Math	Multiply, Add, Divide	Basic arithmetic (no tools)
Reasoning	Apples, Sequence, Logic	Multi-step reasoning
Knowledge	Japan, France, Brazil	World knowledge
Calc	Multiply, Divide, Power	Calculator tool usage
Code	is_even, reverse, max_num	Python code generation

BitNet Benchmark Results

LocalClaw R03 has been tested with Microsoft BitNet-b1.58-2B-4T — a 2B parameter model with 1.58-bit ternary weights, designed for efficient CPU inference.

Test Results Summary

Test Suite	Score	Time	Notes
Model Comparison (15 tests)	13/15 (87%)	394s	5 categories
Robust Comparison (22 tests)	19/22 (86%)	~6min	Incremental save
Comprehensive Test (7 tests)	6/7 (86%)	~90s	Basic + Reasoning + Code

Category Breakdown (Model Comparison - 15 tests)

Category	Score	Pass Rate
Math	3/3	100% ✅
Code	3/3	100% ✅
Calc (with tools)	3/3	100% ✅
Reasoning	2/3	67%
Knowledge	2/3	67%
Total	13/15	87%

Failed Tests

Test	Expected	Got	Category
Apples (reasoning)	5	7	Reasoning
Brazil capital	Brasília	São Paulo	Knowledge

Performance Notes

Metric	Value
Avg response time	5-10s (simple), 100s+ (tool use)
Tool calling	ReAct fallback (no native support)
Context window	Default (model dependent)
Inference	CPU-efficient ternary weights

BitNet vs Ollama Small Models

Rank	Model	Score	Params	Backend
🥇	`qwen2.5-coder:0.5b-instruct-q4_k_m`	14/15 (93%)	494M	Ollama
🥈	`BitNet-b1.58-2B-4T`	13/15 (87%)	2B	BitNet
🥉	`granite3.1-moe:1b`	12/15 (80%)	1B MoE	Ollama
4	`llama3.2:1b`	12/15 (80%)	1.2B	Ollama

Note: BitNet uses 1.58-bit ternary weights, making it highly efficient for CPU inference despite having 2B parameters.

BitNet Setup for Benchmarking

# 1. Clone and compile BitNet
python localclaw/bitnet_setup.py

# 2. Start the BitNet server
./build/bin/llama-server -m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf

# 3. Run benchmark
export LOCALCLAW_BACKEND=bitnet
python examples/07_model_comparison.py

# 4. Run with ACP tracking
export LOCALCLAW_BACKEND=bitnet
python examples/07_model_comparison_acp.py

Observations

Excellent for CPU-only systems — ternary weights enable fast inference without GPU
Solid tool usage — ReAct fallback handles calculator tools reliably
Code generation strong — 100% pass rate on function writing tasks
Multi-step reasoning challenges — the "apples" test requires tracking state
Knowledge gaps — São Paulo is commonly mistaken for Brazil's capital

About

🦞 LocalClaw R03 is written and maintained by VTSTech.

🌐 Website: https://www.vts-tech.org
📦 GitHub: https://github.com/VTSTech/LocalClaw
💻 More projects: https://github.com/VTSTech

Testing Status: LocalClaw has been tested with both Ollama (11 small models) and BitNet (BitNet-b1.58-2B-4T) backends. BitNet achieved 87% on the benchmark, making it the 2nd best performer overall. See Tested Small Models and BitNet Benchmark Results sections for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.0.1

Mar 20, 2026

0.4.0.0

Mar 20, 2026

0.3.1.1

Mar 20, 2026

0.3.1.0

Mar 20, 2026

0.3.0.10

Mar 19, 2026

0.3.0.9

Mar 19, 2026

0.3.0.8

Mar 18, 2026

0.3.0.7

Mar 18, 2026

0.3.0.6

Mar 18, 2026

0.3.0.5

Mar 18, 2026

0.3.0.4

Mar 18, 2026

0.3.0.3

Mar 18, 2026

0.3.0.2

Mar 18, 2026

This version

0.3.0.1

Mar 18, 2026

0.3.0

Mar 18, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

localclaw-0.3.0.1.tar.gz (158.5 kB view details)

Uploaded Mar 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

localclaw-0.3.0.1-py3-none-any.whl (147.3 kB view details)

Uploaded Mar 18, 2026 Python 3

File details

Details for the file localclaw-0.3.0.1.tar.gz.

File metadata

Download URL: localclaw-0.3.0.1.tar.gz
Upload date: Mar 18, 2026
Size: 158.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for localclaw-0.3.0.1.tar.gz
Algorithm	Hash digest
SHA256	`cc531a707ad60a705e0572d7f1e1abbb38a67f1472d34c4b2912157fb27c0763`
MD5	`1da0b07362d774e304c90c6901009524`
BLAKE2b-256	`f292f60ec78645f110fe4b7fd86d01092001712465825b7a082f652a6c9fb604`

See more details on using hashes here.

File details

Details for the file localclaw-0.3.0.1-py3-none-any.whl.

File metadata

Download URL: localclaw-0.3.0.1-py3-none-any.whl
Upload date: Mar 18, 2026
Size: 147.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for localclaw-0.3.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c38eb66b429e48ada738305f531a135f42205eaadf80b2691ae1e1ec197aa1b2`
MD5	`1cdf746a384ef770b5f8de193a434f76`
BLAKE2b-256	`d6090db51ea966de947dfb5cb771096c001b54a13247aad0fe515607ee611285`

See more details on using hashes here.

localclaw 0.3.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🦞 LocalClaw R03

Architecture

Test Scripts

Core design decisions

Installation

From PyPI (Recommended)

From Source

No Installation Required

Setup Ollama

Usage After Installation

BitNet Backend (R03)

Supported Models

Setup (One Command with huggingface-cli)

Manual Download (wget)

Model File Sizes

Start the Server

Use with LocalClaw

Colab Quick Start

Model Comparison

Quick start

1. Single prompt

2. Interactive chat

3. Using BitNet backend

4. With ACP tracking

CLI Commands

CLI Flags

Interactive Commands (in chat)

Built-in Tools

Built-in Skills

Supported models (tool-calling)

Tested Small Models (≤1.5B parameters)

Rankings (Updated)

Model Details

Category Champions

Test Categories

Recommendations

⚠️ Models to Avoid

Known Issues with Small Models

Skills (Agent Skills Specification)

Skill Structure

SKILL.md Format

Using Skills

Progressive Disclosure

Built-in Skills

Orchestrator modes

Running the examples

ACP Integration (Agent Control Panel)

What is ACP?

Enable ACP

Configuration

What Gets Logged

Per-Agent Token Tracking

ACP Server

Remote Ollama Configuration

Timeout Configuration

Automatic Retry

Performance Optimization

CLI Options for Speed

Ollama Model Options

Remote Ollama Tips

Why Inference is Slow

Recent Improvements

R03: BitNet Backend

R03: Enhanced Security

Zero Dependencies

Automatic Error Recovery

Small Model Support

Optimized Test Prompts

New Examples

Test Categories (15 tests)

BitNet Benchmark Results