Run Claude Code locally on the Bonsai 8B 1-bit MLX model.
Project description
bonsai-claude
Run Claude Code locally on Bonsai 8B 1-bit — PrismML's 1-bit quantized Qwen3-8B — via Apple MLX. No Anthropic API key; no tokens leave your Mac.
Install
uv tool install bonsai-claude
Then:
bonsai-claude
(First run auto-downloads the 55 MB PrismML-fork MLX wheel + the Bonsai model weights from HuggingFace.)
Run ephemerally without installing:
uvx bonsai-claude
Requirements
- Apple Silicon Mac (M1 or newer)
- macOS 26+ (the prebuilt fork wheel is tagged
macosx_26_0_arm64) uvon PATH — install:curl -LsSf https://astral.sh/uv/install.sh | shclaudeCLI on PATH
Python 3.12 is managed by uv automatically.
How it works
Claude Code speaks the Anthropic API shape (POST /v1/messages). MLX's server only speaks the OpenAI shape. So ANTHROPIC_BASE_URL can't point directly at it — a translator sits between.
claude CLI ──POST /v1/messages──▶ anthropic_shim :11434 ──POST /v1/chat/completions──▶ mlx_lm.server :8080 ──▶ Bonsai
(Anthropic shape) (direct adapter) (OpenAI shape)
The adapter is ported from ollama/anthropic/anthropic.go (MIT — attribution in NOTICE). It handles request/response translation and the streaming state machine — including the input_json_delta events for tool_calls that LiteLLM's chat→anthropic adapter fails to emit.
Usage
bonsai-claude # interactive: pick context + --bare, then launch
bonsai-claude --non-interactive # skip prompts, use saved prefs or defaults
bonsai-claude --smoke # headless HTTP round-trip test, then exit
bonsai-claude --panes # also open iTerm2 windows: log tail + macmon
bonsai-claude <claude args passed through>
Per-project preferences (max_kv_size, --bare choice) are saved at ~/.mlx_claude/prefs.json keyed by CWD.
Why Bonsai + 1-bit?
Bonsai is an 8B-parameter model in ~1 GB of weights — a ~8× memory reduction vs fp16. It fits in system RAM on M1 Macs that normally can't serve 8B models. The PrismML fork of mlx adds the 1-bit quant kernels needed to run it; the wheel is pinned and auto-fetched.
Prefill rate: ~100-150 tok/s on M-series chips (1-bit saves memory bandwidth but not FLOPs, so prefill is compute-bound). Generation: faster. --bare strips Claude Code's default context to keep turn-1 fast.
Caveats
- Tool-call quality: Bonsai scores ~65.7 on the Berkeley Function Calling Leaderboard. Good enough for most Claude Code flows but weaker than frontier models on complex tool orchestration.
- Large-context slowness: turn-1 with full context can take minutes on 1-bit quant. Use
--bare(the TUI's default) to shrink Claude Code's system prompt 10-20×. - Prefix KV cache is in-memory only: restart the stack, the cache resets. Turn 2+ within a session reuses automatically.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bonsai_claude-0.1.0.tar.gz.
File metadata
- Download URL: bonsai_claude-0.1.0.tar.gz
- Upload date:
- Size: 13.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.16 {"installer":{"name":"uv","version":"0.9.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9c66a1d0661b3aaeb135bd9ff068ca9ee5fc11215ff7bbbff248ba3816aa035e
|
|
| MD5 |
df54fef815a18d2692187bc7a30d5e6f
|
|
| BLAKE2b-256 |
730c5761a7e5bf50c3b0a9db0372892239c50c4b3d8472f975a56180555a653e
|
File details
Details for the file bonsai_claude-0.1.0-py3-none-any.whl.
File metadata
- Download URL: bonsai_claude-0.1.0-py3-none-any.whl
- Upload date:
- Size: 16.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.16 {"installer":{"name":"uv","version":"0.9.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
17698b64f5db83e91985c51c82aa2c5a07c388a00c4bb5daabc2455cf2ff3493
|
|
| MD5 |
0b294e0ebadb55ca67d832647306e503
|
|
| BLAKE2b-256 |
cebce9d0dadbc8a44daa3dc377248df683f900d31211cb5e4acb3e5bc644aec6
|