Skip to main content

A minimal, hackable agentic framework for Ollama and BitNet - local-first AI agent toolkit

Project description

๐Ÿฆž LocalClaw R03

A minimal, hackable agentic framework engineered to run entirely locally with Ollama or BitNet.

Inspired by the architecture of OpenClaw, rebuilt from scratch for local-first operation.

Written by VTSTech ยท GitHub


Architecture

localclaw/
โ”œโ”€โ”€ core/
โ”‚   โ”œโ”€โ”€ ollama_client.py   # Zero-dependency HTTP wrapper (stdlib urllib only)
โ”‚   โ”œโ”€โ”€ tools.py           # Decorator-based tool registry + JSON schema generation
โ”‚   โ”œโ”€โ”€ memory.py          # Sliding-window conversation memory with summarization
โ”‚   โ”œโ”€โ”€ agent.py           # ReAct loop โ€” native tool-call + text-fallback modes
โ”‚   โ””โ”€โ”€ orchestrator.py    # Multi-agent routing (router / pipeline / parallel)
โ”œโ”€โ”€ skills/
โ”‚   โ”œโ”€โ”€ loader.py          # Agent Skills specification loader (progressive disclosure)
โ”‚   โ”œโ”€โ”€ skill-creator/     # OpenClaw skill-creator for generating new skills
โ”‚   โ”œโ”€โ”€ acp/               # ACP (Agent Control Panel) skill
โ”‚   โ”œโ”€โ”€ datetime/          # Datetime utilities skill
โ”‚   โ””โ”€โ”€ web_search/        # Web search skill
โ”œโ”€โ”€ tools/
โ”‚   โ””โ”€โ”€ builtins.py        # Ready-to-use tools: calculator, shell, file I/O, HTTP, REPL
โ”œโ”€โ”€ bitnet_client.py       # R03: BitNet backend client (Microsoft 1.58-bit quantization)
โ”œโ”€โ”€ bitnet_setup.py        # R03: BitNet setup/compilation helper
โ”œโ”€โ”€ acp_plugin.py          # ACP integration for activity tracking and A2A messaging
โ”œโ”€โ”€ model_discovery.py     # R03: Dynamic model discovery for both backends
โ””โ”€โ”€ examples/
    โ”œโ”€โ”€ 01_basic_agent.py           # Simple Q&A demo
    โ”œโ”€โ”€ 02_tool_agent.py            # Tool calling demo
    โ”œโ”€โ”€ 03_orchestrator.py          # Multi-agent routing demo
    โ”œโ”€โ”€ 04_comprehensive_test.py    # Full test suite (supports BitNet)
    โ”œโ”€โ”€ 04_comprehensive_test_acp.py # ACP-tracked version
    โ”œโ”€โ”€ 05_tool_tests.py            # Tool-specific tests
    โ”œโ”€โ”€ 06_interactive_chat.py      # Interactive CLI chat
    โ”œโ”€โ”€ 07_model_comparison.py      # Compare models on 15 tests (3 per category)
    โ”œโ”€โ”€ 07_model_comparison_acp.py  # ACP-tracked version with model logging
    โ”œโ”€โ”€ 08_robust_comparison.py     # Progress-saving comparison for unstable connections
    โ”œโ”€โ”€ 08_robust_comparison_acp.py # ACP-tracked version with resumability
    โ”œโ”€โ”€ 09_expanded_benchmark.py    # 25 tests across 8 categories
    โ”œโ”€โ”€ 10_skills_demo.py           # Agent Skills system demo
    โ””โ”€โ”€ 11_skill_creator_test.py    # Skill creation benchmark across models

Test Scripts

test.sh          # Bash: Run all 11 examples (Linux/macOS/Colab)
test-quick.sh    # Bash: Run 7 quick tests (skips benchmarks)
run.sh           # Bash: Interactive menu for single example
test-bitnet.sh   # Bash: Run BitNet benchmark tests
test.cmd         # Batch: Run all 11 examples (Windows)
test-quick.cmd   # Batch: Run 7 quick tests (Windows)
run.cmd          # Batch: Interactive menu for single example (Windows)
test-bitnet.cmd  # Batch: Run BitNet benchmark tests (Windows)

Core design decisions

Concern Approach
HTTP Client Zero external dependencies โ€” uses Python stdlib urllib only
Backends Ollama (default) or BitNet (R03) โ€” switch via --backend flag
Tool calling Native Ollama tool-call protocol when supported; automatic ReAct text-parsing fallback for other models
Memory Sliding window โ€” older turns are archived and optionally compressed via LLM summarization
Tools Decorator-based, auto-generates JSON schemas from Python type hints
Orchestration Router (LLM picks agent), Pipeline (chain), or Parallel (concurrent + merge)
Streaming First-class via generator interface
Error handling Automatic retry with exponential backoff for transient network/server errors
Security Path validation, command blocklist, SSRF protection (R03)

Installation

From PyPI (Recommended)

pip install localclaw

# Or install from GitHub for the latest development version:
pip install git+https://github.com/VTSTech/LocalClaw.git

From Source

# Clone the repository
git clone https://github.com/VTSTech/LocalClaw.git
cd LocalClaw

# Install in development mode
pip install -e .

No Installation Required

LocalClaw uses only Python stdlib โ€” no dependencies! You can also just copy the localclaw directory into your project:

# Just copy and use
cp -r localclaw /path/to/your/project/

Setup Ollama

# Make sure Ollama is running:
ollama serve

# Pull a model:
ollama pull qwen2.5-coder:0.5b-instruct-q4_k_m

Usage After Installation

# Use the CLI command
localclaw chat --model llama3.1:8b

# Or use as a module
python -m localclaw chat --model llama3.1:8b

# Or in Python code
from localclaw import Agent
agent = Agent(model="llama3.1:8b")

BitNet Backend (R03)

LocalClaw supports Microsoft's BitNet for 1.58-bit ternary weight models โ€” highly efficient CPU inference.

Supported Models

Model Size HuggingFace Repo
BitNet-b1.58-2B-4T ~0.4 GB microsoft/BitNet-b1.58-2B-4T
Falcon3-1B-Instruct ~1 GB tiiuae/Falcon3-1B-Instruct-1.58bit
Falcon3-3B-Instruct ~3 GB tiiuae/Falcon3-3B-Instruct-1.58bit
Falcon3-7B-Instruct ~7 GB tiiuae/Falcon3-7B-Instruct-1.58bit
Falcon3-10B-Instruct ~10 GB tiiuae/Falcon3-10B-Instruct-1.58bit

Setup (One Command with huggingface-cli)

BitNet's setup_env.py handles everything: download, convert to GGUF, quantize, and compile kernels.

# Clone BitNet
git clone --recursive https://github.com/microsoft/BitNet.git
cd BitNet
pip install -r requirements.txt

# Download, convert, and prepare a model (choose one):
python setup_env.py --hf-repo microsoft/BitNet-b1.58-2B-4T -q i2_s      # Recommended
python setup_env.py --hf-repo tiiuae/Falcon3-1B-Instruct-1.58bit -q i2_s  # Smallest Falcon
python setup_env.py --hf-repo tiiuae/Falcon3-3B-Instruct-1.58bit -q i2_s  # Best balance
python setup_env.py --hf-repo tiiuae/Falcon3-7B-Instruct-1.58bit -q i2_s  # Most capable

This automatically:

  1. Downloads the model from HuggingFace (safetensors format)
  2. Converts to GGUF format
  3. Quantizes to i2_s (1.58-bit ternary)
  4. Compiles optimized CPU kernels

Manual Download (wget)

If you prefer not to use huggingface-cli, download directly with wget:

# Create model directory
mkdir -p models/Falcon3-1B-Instruct-1.58bit
cd models/Falcon3-1B-Instruct-1.58bit

# Download model files (~1.3GB for 1B, ~3.2GB for 3B, ~7.5GB for 7B)
wget https://huggingface.co/tiiuae/Falcon3-1B-Instruct-1.58bit/resolve/main/model.safetensors
wget https://huggingface.co/tiiuae/Falcon3-1B-Instruct-1.58bit/resolve/main/config.json
wget https://huggingface.co/tiiuae/Falcon3-1B-Instruct-1.58bit/resolve/main/tokenizer.json
wget https://huggingface.co/tiiuae/Falcon3-1B-Instruct-1.58bit/resolve/main/tokenizer_config.json
wget https://huggingface.co/tiiuae/Falcon3-1B-Instruct-1.58bit/resolve/main/special_tokens_map.json
wget https://huggingface.co/tiiuae/Falcon3-1B-Instruct-1.58bit/resolve/main/generation_config.json

# Or for BitNet-b1.58-2B-4T (~400MB):
mkdir -p models/BitNet-b1.58-2B-4T
cd models/BitNet-b1.58-2B-4T
wget https://huggingface.co/microsoft/BitNet-b1.58-2B-4T/resolve/main/model.safetensors
wget https://huggingface.co/microsoft/BitNet-b1.58-2B-4T/resolve/main/config.json
wget https://huggingface.co/microsoft/BitNet-b1.58-2B-4T/resolve/main/tokenizer.json
wget https://huggingface.co/microsoft/BitNet-b1.58-2B-4T/resolve/main/tokenizer_config.json

Then run setup_env.py pointing to your downloaded model:

cd ../..  # Back to BitNet root
python setup_env.py --model-dir models/Falcon3-1B-Instruct-1.58bit -q i2_s

Model File Sizes

Model model.safetensors Total Download
Falcon3-1B-Instruct ~1.3 GB ~1.4 GB
Falcon3-3B-Instruct ~3.2 GB ~3.4 GB
Falcon3-7B-Instruct ~7.5 GB ~7.8 GB
BitNet-b1.58-2B-4T ~400 MB ~500 MB

Start the Server

# Start BitNet server (separate terminal)
./build/bin/llama-server -m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf

# Or for Falcon models:
./build/bin/llama-server -m models/Falcon3-1B-Instruct-1.58bit/ggml-model-i2_s.gguf

Use with LocalClaw

# Set BitNet URL (default: http://localhost:8080)
export BITNET_BASE_URL=http://localhost:8080

# Chat with BitNet backend
localclaw chat --backend bitnet --force-react

# With tools
localclaw chat --backend bitnet --force-react --tools calculator,shell

Note: BitNet models require --force-react as they don't support native tool calling.

Colab Quick Start

# Cell 1: Setup BitNet with Falcon3-1B (fastest option)
!git clone --recursive https://github.com/microsoft/BitNet.git
%cd BitNet
!pip install -r requirements.txt
!python setup_env.py --hf-repo tiiuae/Falcon3-1B-Instruct-1.58bit -q i2_s

# Cell 2: Start server in background
import subprocess, time
server = subprocess.Popen(
    ['./build/bin/llama-server', '-m', 'models/Falcon3-1B-Instruct-1.58bit/ggml-model-i2_s.gguf', '--port', '8080'],
    stdout=subprocess.PIPE, stderr=subprocess.PIPE
)
time.sleep(5)  # Wait for server startup

# Cell 3: Clone and run LocalClaw
%cd /content
!git clone https://github.com/VTSTech/LocalClaw.git
%cd LocalClaw
!localclaw chat --backend bitnet --force-react

Model Comparison

Model Speed Quality Best For
BitNet-b1.58-2B-4T โšกโšกโšก Good Quick tasks, testing
Falcon3-1B-Instruct โšกโšกโšก Good Fastest inference
Falcon3-3B-Instruct โšกโšก Better Balanced performance
Falcon3-7B-Instruct โšก Best Complex reasoning

BitNet Benchmark Results: BitNet-b1.58-2B-4T achieved 87% on the LocalClaw benchmark โ€” see BitNet Benchmark Results section below.


Quick start

1. Single prompt

# Simple Q&A
localclaw run "What is the capital of Japan?"

# With streaming output
localclaw run "Tell me a joke." --stream

# Specify a model
localclaw run "Explain quantum computing" -m llama3.2:3b

2. Interactive chat

# Start interactive session
localclaw chat -m qwen2.5-coder:0.5b

# With tools enabled
localclaw chat -m llama3.1:8b --tools calculator,shell,read_file,write_file

# With skills loaded
localclaw chat -m llama3.2:3b --skills skill-creator --tools write_file,shell

# Fast mode (reduced context for speed)
localclaw chat -m qwen2.5-coder:0.5b --fast --verbose

3. Using BitNet backend

# BitNet requires --force-react for tool support
localclaw chat --backend bitnet --force-react

# Run single prompt with BitNet
localclaw run "Calculate 17 * 23" --backend bitnet --tools calculator

4. With ACP tracking

# Enable ACP for activity monitoring
localclaw chat -m qwen2.5-coder:0.5b --acp --tools shell,read_file,write_file

# Single prompt with ACP
localclaw run "What is 2+2?" --acp

CLI Commands

Command Description
run "prompt" Run single prompt and exit
chat Interactive multi-turn conversation
models List available Ollama models
tools List built-in tools
skills List available Agent Skills

CLI Flags

Flag Description
-m, --model Model name (default: qwen2.5-coder:0.5b)
--tools Comma-separated tool list
--skills Comma-separated skill list
--backend ollama or bitnet
--force-react Force ReAct text parsing
--acp Enable ACP integration
-v, --verbose Show tool calls and timing
--debug Show detailed debug info
--fast Preset: reduced context for speed
--warmup Pre-load model before chat
--stream Stream output token-by-token
--temperature Sampling temperature (0.0-2.0)
--num-ctx Context window size
--num-predict Max output tokens

Interactive Commands (in chat)

Command Description
/help Show available commands
/status Show session status
/tools List active tools
/skills List active skills
/reset Clear conversation history
/undo Remove last exchange
/retry Retry last message
/a2a Process pending A2A messages
/export Export to markdown
exit End session

Built-in Tools

Tool Description
calculator Evaluate math expressions
python_repl Execute Python code
shell Run shell commands
read_file Read file contents
write_file Write content to file
list_directory List directory contents
http_get HTTP GET request
save_note Save a note to memory
get_note Retrieve saved notes
# List all tools
localclaw tools

# Use specific tools
localclaw chat --tools calculator,python_repl,shell

Built-in Skills

Skill Description
skill-creator Generate new Agent Skills from requests
datetime Date/time formatting and calculations
web_search Web search capabilities
# List all skills
localclaw skills

# Use skills in chat
localclaw chat --skills skill-creator --tools write_file

Supported models (tool-calling)

The following model families support native tool calling in Ollama and are auto-detected:

Meta Llama: llama3, llama3.1, llama3.2, llama3.3, llama3-groq-tool-use

Mistral AI: mistral, mixtral, mistral-nemo, mistral-small, mistral-large, codestral, ministral

Alibaba Qwen: qwen2, qwen2.5, qwen3, qwen35, qwen2.5-coder, qwen2-math

Cohere: command-r, command-r7b

DeepSeek: deepseek, deepseek-coder, deepseek-v2, deepseek-v3

Microsoft Phi: phi-3, phi3, phi-4

Google Gemma: functiongemma (designed for function calling)

Others: yi-, yi1.5, internlm2, internlm2.5, solar, glm4, chatglm, firefunction, hermes, nemotron, cogito, athene

All other models fall back to ReAct text-parsing automatically.


Tested Small Models (โ‰ค1.5B parameters)

The following models have been tested with a 15-test benchmark (3 tests per category: Math, Reasoning, Knowledge, Calc Tool, Code). Prompts are optimized for small model comprehension.

Rankings (Updated)

Rank Model Score Time Math Reason Know Calc Code
๐Ÿฅ‡ qwen2.5-coder:0.5b-instruct-q4_k_m 14/15 (93%) ~80s 3/3 2/3 2/3 3/3 3/3
๐Ÿฅˆ BitNet-b1.58-2B-4T (BitNet) 13/15 (87%) ~394s 3/3 2/3 2/3 3/3 3/3
๐Ÿฅ‰ granite3.1-moe:1b 12/15 (80%) ~60s 3/3 2/3 3/3 1/3 3/3
4 llama3.2:1b 12/15 (80%) ~600s 3/3 1/3 2/3 3/3 3/3
5 gemma3:270m 10/15 (67%) ~75s 3/3 1/3 1/3 2/3 3/3
6 qwen3:0.6b ~9/12 ~130s 2/3 3/3 3/3 0/3 โ€”
7 granite4:350m 8/15 (53%) ~97s 2/3 1/3 2/3 0/3 3/3
8 qwen2.5:0.5b 10/15 (67%) ~107s 1/3 3/3 3/3 0/3 3/3
9 qwen2-math:1.5b 12/15 (80%) ~611s 3/3 3/3 3/3 โŒ 3/3
10 tinyllama:latest 9/15 (60%) ~587s 2/3 2/3 3/3 0/3 2/3
11 smollm:135m 7/15 (47%) ~285s 0/3 2/3 2/3 0/3 3/3
12 functiongemma:270m 1/15 (7%) ~90s 0/3 0/3 0/3 0/3 1/3

Note: Scores vary between runs due to model non-determinism. The qwen2.5-coder:0.5b achieved 100% in some runs.

Model Details

Model Params Size Speed Tool Support Notes
qwen2.5-coder:0.5b 494M ~400MB โšก Fast โœ… Native ๐Ÿ† Best overall! Excellent tool usage
BitNet-b1.58-2B-4T 2B ~1.3GB โšก Medium โš ๏ธ ReAct ๐Ÿฅˆ 2nd place! CPU-efficient ternary weights
granite3.1-moe:1b 1B MoE ~1.4GB โšก Medium โœ… Native Strong knowledge, HTTP 500 on long context
llama3.2:1b 1.2B ~1.3GB ๐Ÿข Slow โœ… Native 128k context! Thorough but slow
gemma3:270m 270M ~292MB โšกโšก Fastest โš ๏ธ ReAct JSON Uses JSON ReAct format, Math & Code champion
qwen3:0.6b 600M ~523MB โšก Medium โš ๏ธ Text Perfect reasoning but Calc returns empty
granite4:350m 350M ~708MB โšก Fast โŒ Refused Refuses calculator - safety filter
qwen2.5:0.5b 494M ~398MB โšก Fast โš ๏ธ Text Reasoning & Knowledge champ, Calc fails
qwen2-math:1.5b 1.5B ~935MB ๐Ÿข Slow โŒ No tools 4 perfect categories! No tool support
tinyllama:latest 1.1B ~638MB ๐Ÿข Slow โš ๏ธ Text Older model, verbose, unstable
smollm:135m 135M ~92MB โšก Fast โŒ None Smallest - hallucinates math (7ร—8=42!)
functiongemma:270m 270M ~301MB โšก Fast โŒ Broken Worst performer - returns empty

Category Champions

Category Champion Score Notes
Math qwen2.5-coder:0.5b, granite3.1-moe:1b, BitNet-b1.58-2B 3/3 Also gemma3:270m
Reasoning qwen2.5:0.5b, qwen3:0.6b, qwen2-math 3/3 Multiple tied
Knowledge granite3.1-moe:1b, qwen2-math 3/3 Multiple tied at 3/3
Calc qwen2.5-coder:0.5b, llama3.2:1b, BitNet-b1.58-2B 3/3 100% tool usage with ReAct
Code Many models 3/3 Code generation is easy for small models!

Test Categories

Category Tests What it measures
Math Multiply, Add, Divide Basic arithmetic without tools
Reasoning Apples, Sequence, Logic Multi-step reasoning and deduction
Knowledge Japan, France, Brazil capitals World knowledge recall
Calc Multiply, Divide, Power Tool usage with calculator
Code is_even, reverse, max_num Python function generation

Recommendations

Use Case Recommended Model Why
General use qwen2.5-coder:0.5b-instruct-q4_k_m Best all-around, fast, great tool usage
Large context llama3.2:1b 128k context window - handles long conversations
Math tasks qwen2.5-coder:0.5b or qwen2-math:1.5b Perfect math scores
Reasoning tasks qwen2.5:0.5b or qwen3:0.6b Perfect reasoning
Tool usage qwen2.5-coder:0.5b Most reliable tool calling
Fastest inference gemma3:270m 270M params, fastest responses
No tools needed qwen2-math:1.5b 4/5 categories perfect (no Calc)
Smallest footprint smollm:135m 92MB - but expect hallucinations

โš ๏ธ Models to Avoid

Model Issue
functiongemma:270m Despite the name, terrible at function calling - returns empty or refuses
smollm:135m Hallucinates wrong math (7ร—8=42), only 7/15 score
granite4:350m Refuses calculator tools (safety filter)

Known Issues with Small Models

  1. Tool calling variations:
    • granite4:350m: Refuses calculator ("I'm sorry, but I can't assist with that")
    • functiongemma:270m: Asks for clarification instead of using tools
    • qwen2.5:0.5b, qwen3:0.6b: Returns empty responses on Calc tests
    • qwen2-math:1.5b: HTTP 400 - doesn't support tool calling at all
  2. Math hallucinations: smollm:135m says "7ร—8=42", tinyllama says "7ร—8=45"
  3. Power operator confusion: gemma3:270m reads 2**10 as 2*10=20
  4. Reasoning failures: Some models answer "8" for sequence "2,4,6,8,?" (repeat last)
  5. Stability issues:
    • granite3.1-moe:1b: HTTP 500 crashes (server EOF)
    • tinyllama, qwen3:0.6b: HTTP 524 timeouts
  6. Empty responses: functiongemma:270m returns empty strings on most tests

Skills (Agent Skills Specification)

๐Ÿฆž LocalClaw R03 supports the Agent Skills specification for reusable instruction bundles.

Skill Structure

skills/
โ””โ”€โ”€ my-skill/
    โ”œโ”€โ”€ SKILL.md          # Required: name, description, instructions
    โ”œโ”€โ”€ scripts/          # Optional: executable scripts
    โ”œโ”€โ”€ references/       # Optional: additional docs
    โ””โ”€โ”€ assets/           # Optional: templates, images

SKILL.md Format

---
name: calculator
description: Perform mathematical calculations. Use when the user needs to compute expressions.
---

# Calculator Skill

Instructions for the model on how to use this skill...

Using Skills

# Load skills via CLI
localclaw chat --skills skill-creator --tools write_file,shell

# Multiple skills
localclaw chat --skills datetime,web_search --tools calculator

Progressive Disclosure

Skills follow a three-level loading system:

  1. Metadata (~100 tokens): name + description loaded at startup
  2. Instructions (<500 lines): Full SKILL.md body loaded when skill triggers
  3. Resources (as needed): Files in scripts/, references/, assets/ loaded on demand

Built-in Skills

Skill Description
skill-creator OpenClaw's platform-agnostic skill generator. Creates new skills from user requests.
datetime Date and time utilities for formatting, parsing, and calculations.
web_search Web search capabilities for retrieving information from the internet.

Orchestrator modes

Mode Behaviour
router A small routing LLM picks the best agent for each request
pipeline Agents run sequentially โ€” each receives the previous agent's output
parallel All agents run concurrently; results are merged with attribution

Running the examples

# Make sure Ollama is serving and you have a model pulled
ollama pull qwen2.5-coder:0.5b-instruct-q4_k_m

# Or use a remote Ollama instance by editing localclaw/core/ollama_client.py

# Quick test suite (recommended first run)
bash test-quick.sh      # Linux/macOS/Colab
test-quick.cmd          # Windows

# Full test suite (all 11 examples)
bash test.sh            # Linux/macOS/Colab
test.cmd                # Windows

# Interactive menu
bash run.sh             # Linux/macOS/Colab
run.cmd                 # Windows

# Run individual examples
python examples/01_basic_agent.py
python examples/02_tool_agent.py
python examples/03_orchestrator.py
python examples/04_comprehensive_test.py
python examples/05_tool_tests.py
python examples/06_interactive_chat.py
python examples/07_model_comparison.py
python examples/08_robust_comparison.py
python examples/09_expanded_benchmark.py
python examples/10_skills_demo.py
python examples/11_skill_creator_test.py

ACP Integration (Agent Control Panel)

๐Ÿฆž LocalClaw R03 supports ACP (Agent Control Panel) for centralized activity tracking, token monitoring, and multi-agent coordination.

What is ACP?

ACP is a monitoring and observability protocol for AI agents. Unlike communication protocols (MCP, A2A), ACP sits alongside your agents and provides:

  • Activity Tracking: Real-time monitoring of all agent actions
  • Token Management: Context window usage estimation per agent
  • Multi-Agent Coordination: Track multiple agents in one session
  • STOP/Resume Control: Emergency stop capability
  • Session Persistence: State preserved across restarts

Enable ACP

# Run with ACP tracking
localclaw chat --acp --tools shell,read_file,write_file -m qwen2.5-coder:0.5b

# Run single prompt with ACP
localclaw run --acp "What is 2+2?"

Configuration

Set your ACP server URL via environment variables:

# Local ACP
export ACP_URL="http://localhost:8766"

# Remote ACP (cloudflare tunnel)
export ACP_URL="https://your-tunnel.trycloudflare.com"

# Credentials
export ACP_USER="admin"
export ACP_PASS="secret"

Or edit localclaw/config.py for persistent settings.

What Gets Logged

Activity Description
Bootstrap Session start, identity establishment
User messages All prompts sent to the model
Assistant messages All model responses
Tool calls Shell commands, file operations, etc.
Tool results Outcomes from tool execution

Per-Agent Token Tracking

When multiple agents connect to the same ACP session:

{
  "primary_agent": "Super Z",
  "agent_tokens": {
    "Super Z": 42000,
    "LocalClaw": 500
  },
  "other_agents_tokens": 500
}
  • First agent to connect becomes primary (owns main context window)
  • Other agents tracked separately in agent_tokens
  • Prevents context pollution between agents

ACP Server

To run your own ACP server, see the ACP Specification:

# ACP is a single Python file
python VTSTech-GLMACP.py

# With cloudflare tunnel
GLMACP_TUNNEL=auto python VTSTech-GLMACP.py

Remote Ollama Configuration

To use a remote Ollama instance (e.g., via Cloudflare tunnel), set the environment variable:

# Local Ollama (default)
export OLLAMA_URL="http://localhost:11434"

# Remote Ollama (cloudflare tunnel)
export OLLAMA_URL="https://your-tunnel.trycloudflare.com"

Or edit localclaw/config.py for persistent settings.

Timeout Configuration

Configure via environment variables:

# Request timeout in seconds (default: 90s for Cloudflare tunnel compatibility)
export OLLAMA_TIMEOUT=90

# Max retry attempts for transient errors (default: 3)
export OLLAMA_MAX_RETRIES=3

# Initial retry delay in seconds (default: 5s, doubles each retry)
export OLLAMA_RETRY_DELAY=5

Automatic Retry

LocalClaw automatically retries on transient errors with exponential backoff:

Error Code Description Retry Behavior
HTTP 524 Cloudflare tunnel timeout Retries up to 3 times
HTTP 502/503/504 Server temporarily unavailable Retries up to 3 times
HTTP 500 Server error (model loading, memory pressure) Retries up to 3 times
Timeout Socket or connection timeout Retries up to 3 times

Performance Optimization

CLI Options for Speed

# Fast mode - reduces context and output for quicker responses
localclaw chat -m qwen2.5-coder:0.5b --fast --verbose

# Fine-tuned control
localclaw chat -m qwen2.5-coder:0.5b --num-ctx 2048 --num-predict 128

# Warm up model before chat (useful for remote Ollama with cold starts)
localclaw chat -m qwen2.5-coder:0.5b --warmup --fast
Option Description Speed Impact
--fast Preset: num_ctx=2048, num_predict=256 ๐Ÿš€ Significant
--num-ctx N Reduce context window (default varies by model) ๐Ÿš€ Significant
--num-predict N Limit max output tokens โšก Moderate
--warmup Pre-load model before first chat โšก Faster first response

Ollama Model Options

Control model behavior via CLI flags:

# Lower temperature = more deterministic
localclaw chat -m qwen2.5-coder:0.5b --temperature 0.1

# Smaller context = faster
localclaw chat -m qwen2.5-coder:0.5b --num-ctx 2048 --num-predict 128

# Combined for optimal speed
localclaw chat -m qwen2.5-coder:0.5b --fast --temperature 0.3

Remote Ollama Tips

When using a remote Ollama via Cloudflare tunnel:

  1. Use --fast flag - Reduces inference time significantly
  2. Use smaller models - qwen2.5-coder:0.5b is fastest
  3. Warm up the model - First request is slowest due to model loading
  4. Increase timeout if needed: export OLLAMA_TIMEOUT=120
# Recommended for remote Ollama
localclaw chat -m qwen2.5-coder:0.5b-instruct-q4_k_m \
    --fast --warmup --verbose \
    --tools python_repl

Why Inference is Slow

Factor Impact Solution
Model size Larger models = slower Use smaller quantized models
Context window More context = slower Use --num-ctx 2048 or smaller
Output length More tokens = slower Use --num-predict 128
Remote connection Network latency Use local Ollama if possible
Cold start First load is slowest Use --warmup flag
GPU unavailable CPU inference is slow Ensure GPU is configured

Recent Improvements

R03: BitNet Backend

๐Ÿฆž LocalClaw R03 adds BitNet backend support for running Microsoft's 1.58-bit quantized models:

  • New backend: Switch between Ollama and BitNet via --backend flag
  • Zero-cost inference: BitNet models run efficiently on CPU
  • Setup helper: bitnet_setup.py handles cloning and compilation
  • Note: BitNet requires ReAct fallback (no native tool support)

R03: Enhanced Security

Built-in tools now have comprehensive security:

  • Path validation: Restrict file access to allowed directories
  • Command blocklist: Block dangerous commands (rm, sudo, chmod, etc.)
  • Pattern detection: Detect dangerous shell patterns (pipes to bash, command substitution)
  • SSRF protection: Block private IPs and cloud metadata endpoints in http_get
  • Configurable modes: strict, permissive, or disabled
# Set security mode
export LOCALCLAW_SECURITY_MODE=strict
export LOCALCLAW_ALLOWED_PATHS=/home/user/projects:/tmp
export LOCALCLAW_BLOCKED_COMMANDS=rm,sudo,dd

Zero Dependencies

๐Ÿฆž LocalClaw R03 continues to use only Python stdlib โ€” no pip install required! The HTTP client uses urllib instead of httpx.

Automatic Error Recovery

  • HTTP 524/502/503/504/500 retry: Transient server errors are automatically retried with exponential backoff
  • Timeout retry: Socket timeouts are retried automatically
  • Configurable via environment variables: OLLAMA_TIMEOUT, OLLAMA_MAX_RETRIES, OLLAMA_RETRY_DELAY

Small Model Support

๐Ÿฆž LocalClaw R03 handles quirks of small models (โ‰ค1.5B parameters):

  • Fuzzy tool name matching: Hallucinated tool names like calculate_expression are automatically mapped to calculator
  • Argument auto-fixing: Common wrong argument patterns are corrected (e.g., {"base": 2, "exponent": 10} โ†’ {"expression": "2 ** 10"})
  • JSON response cleaning: When models output tool schemas instead of text answers, LocalClaw falls back to tool results
  • Unicode normalization: Accented characters are normalized for comparison (e.g., "Brasรญlia" matches "brasilia")
  • ReAct text parsing: Models without native tool support automatically fall back to text-based ReAct format

Optimized Test Prompts

Key insights for small model prompt engineering:

  1. State the fact first: "The capital of Japan is Tokyo. What is the capital of Japan?"
  2. Show the answer format: "Answer: Tokyo" at the end
  3. Give calculation steps: "10 minus 3 equals 7. Then 7 minus 2 equals 5."
  4. Be explicit with tools: "Use calculator tool. Expression: 2 ** 10. Result: 1024"
  5. Guide code output: "Start with: def is_even(n):"

New Examples

Example Description
07_model_comparison.py Benchmark 15 tests across models with category breakdown
08_robust_comparison.py Progress-saving comparison for unstable connections
09_expanded_benchmark.py 25 tests across 8 categories including tool chaining
10_skills_demo.py Demonstrate Agent Skills system with skill-creator
11_skill_creator_test.py Benchmark skill creation across multiple small models

Test Categories (15 tests)

Category Tests Description
Math Multiply, Add, Divide Basic arithmetic (no tools)
Reasoning Apples, Sequence, Logic Multi-step reasoning
Knowledge Japan, France, Brazil World knowledge
Calc Multiply, Divide, Power Calculator tool usage
Code is_even, reverse, max_num Python code generation

BitNet Benchmark Results

LocalClaw R03 has been tested with Microsoft BitNet-b1.58-2B-4T โ€” a 2B parameter model with 1.58-bit ternary weights, designed for efficient CPU inference.

Test Results Summary

Test Suite Score Time Notes
Model Comparison (15 tests) 13/15 (87%) 394s 5 categories
Robust Comparison (22 tests) 19/22 (86%) ~6min Incremental save
Comprehensive Test (7 tests) 6/7 (86%) ~90s Basic + Reasoning + Code

Category Breakdown (Model Comparison - 15 tests)

Category Score Pass Rate
Math 3/3 100% โœ…
Code 3/3 100% โœ…
Calc (with tools) 3/3 100% โœ…
Reasoning 2/3 67%
Knowledge 2/3 67%
Total 13/15 87%

Failed Tests

Test Expected Got Category
Apples (reasoning) 5 7 Reasoning
Brazil capital Brasรญlia Sรฃo Paulo Knowledge

Performance Notes

Metric Value
Avg response time 5-10s (simple), 100s+ (tool use)
Tool calling ReAct fallback (no native support)
Context window Default (model dependent)
Inference CPU-efficient ternary weights

BitNet vs Ollama Small Models

Rank Model Score Params Backend
๐Ÿฅ‡ qwen2.5-coder:0.5b-instruct-q4_k_m 14/15 (93%) 494M Ollama
๐Ÿฅˆ BitNet-b1.58-2B-4T 13/15 (87%) 2B BitNet
๐Ÿฅ‰ granite3.1-moe:1b 12/15 (80%) 1B MoE Ollama
4 llama3.2:1b 12/15 (80%) 1.2B Ollama

Note: BitNet uses 1.58-bit ternary weights, making it highly efficient for CPU inference despite having 2B parameters.

BitNet Setup for Benchmarking

# 1. Clone and compile BitNet
python localclaw/bitnet_setup.py

# 2. Start the BitNet server
./build/bin/llama-server -m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf

# 3. Run benchmark
export LOCALCLAW_BACKEND=bitnet
python examples/07_model_comparison.py

# 4. Run with ACP tracking
export LOCALCLAW_BACKEND=bitnet
python examples/07_model_comparison_acp.py

Observations

  1. Excellent for CPU-only systems โ€” ternary weights enable fast inference without GPU
  2. Solid tool usage โ€” ReAct fallback handles calculator tools reliably
  3. Code generation strong โ€” 100% pass rate on function writing tasks
  4. Multi-step reasoning challenges โ€” the "apples" test requires tracking state
  5. Knowledge gaps โ€” Sรฃo Paulo is commonly mistaken for Brazil's capital

About

๐Ÿฆž LocalClaw R03 is written and maintained by VTSTech.


Testing Status: LocalClaw has been tested with both Ollama (11 small models) and BitNet (BitNet-b1.58-2B-4T) backends. BitNet achieved 87% on the benchmark, making it the 2nd best performer overall. See Tested Small Models and BitNet Benchmark Results sections for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

localclaw-0.3.0.1.tar.gz (158.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

localclaw-0.3.0.1-py3-none-any.whl (147.3 kB view details)

Uploaded Python 3

File details

Details for the file localclaw-0.3.0.1.tar.gz.

File metadata

  • Download URL: localclaw-0.3.0.1.tar.gz
  • Upload date:
  • Size: 158.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for localclaw-0.3.0.1.tar.gz
Algorithm Hash digest
SHA256 cc531a707ad60a705e0572d7f1e1abbb38a67f1472d34c4b2912157fb27c0763
MD5 1da0b07362d774e304c90c6901009524
BLAKE2b-256 f292f60ec78645f110fe4b7fd86d01092001712465825b7a082f652a6c9fb604

See more details on using hashes here.

File details

Details for the file localclaw-0.3.0.1-py3-none-any.whl.

File metadata

  • Download URL: localclaw-0.3.0.1-py3-none-any.whl
  • Upload date:
  • Size: 147.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for localclaw-0.3.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c38eb66b429e48ada738305f531a135f42205eaadf80b2691ae1e1ec197aa1b2
MD5 1cdf746a384ef770b5f8de193a434f76
BLAKE2b-256 d6090db51ea966de947dfb5cb771096c001b54a13247aad0fe515607ee611285

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page