Skip to main content

Run Claude Code locally on the Bonsai 8B 1-bit MLX model.

Project description

bonsai-claude

PyPI Python License Downloads

Run Claude Code locally on Bonsai 8B 1-bitPrismML's 1-bit quantized Qwen3-8B — via Apple MLX. No Anthropic API key; no tokens leave your Mac.

Install

uv tool install bonsai-claude

Then:

bonsai-claude

(First run auto-downloads the 55 MB PrismML-fork MLX wheel + the Bonsai model weights from HuggingFace.)

Run ephemerally without installing:

uvx bonsai-claude

Requirements

  • Apple Silicon Mac (M1 or newer)
  • macOS 26+ (the prebuilt fork wheel is tagged macosx_26_0_arm64)
  • uv on PATH — install: curl -LsSf https://astral.sh/uv/install.sh | sh
  • claude CLI on PATH

Python 3.12 is managed by uv automatically.

How it works

Claude Code speaks the Anthropic API shape (POST /v1/messages). MLX's server only speaks the OpenAI shape. So ANTHROPIC_BASE_URL can't point directly at it — a translator sits between.

claude CLI ──POST /v1/messages──▶ anthropic_shim :11434 ──POST /v1/chat/completions──▶ mlx_lm.server :8080 ──▶ Bonsai
            (Anthropic shape)       (direct adapter)         (OpenAI shape)

The adapter is ported from ollama/anthropic/anthropic.go (MIT — attribution in NOTICE). It handles request/response translation and the streaming state machine — including the input_json_delta events for tool_calls that LiteLLM's chat→anthropic adapter fails to emit.

Usage

bonsai-claude                         # interactive: pick context + --bare, then launch
bonsai-claude --non-interactive       # skip prompts, use saved prefs or defaults
bonsai-claude --smoke                 # headless HTTP round-trip test, then exit
bonsai-claude --panes                 # also open iTerm2 windows: log tail + macmon
bonsai-claude <claude args passed through>

Per-project preferences (max_kv_size, --bare choice) are saved at ~/.mlx_claude/prefs.json keyed by CWD.

Why Bonsai + 1-bit?

Bonsai is an 8B-parameter model in ~1 GB of weights — a ~8× memory reduction vs fp16. It fits in system RAM on M1 Macs that normally can't serve 8B models. The PrismML fork of mlx adds the 1-bit quant kernels needed to run it; the wheel is pinned and auto-fetched.

Prefill rate: ~100-150 tok/s on M-series chips (1-bit saves memory bandwidth but not FLOPs, so prefill is compute-bound). Generation: faster. --bare strips Claude Code's default context to keep turn-1 fast.

Caveats

  • Tool-call quality: Bonsai scores ~65.7 on the Berkeley Function Calling Leaderboard. Good enough for most Claude Code flows but weaker than frontier models on complex tool orchestration.
  • Large-context slowness: turn-1 with full context can take minutes on 1-bit quant. Use --bare (the TUI's default) to shrink Claude Code's system prompt 10-20×.
  • Prefix KV cache is in-memory only: restart the stack, the cache resets. Turn 2+ within a session reuses automatically.

License

MIT. See LICENSE and NOTICE for attributions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bonsai_claude-0.1.0.tar.gz (13.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bonsai_claude-0.1.0-py3-none-any.whl (16.0 kB view details)

Uploaded Python 3

File details

Details for the file bonsai_claude-0.1.0.tar.gz.

File metadata

  • Download URL: bonsai_claude-0.1.0.tar.gz
  • Upload date:
  • Size: 13.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.16 {"installer":{"name":"uv","version":"0.9.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for bonsai_claude-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9c66a1d0661b3aaeb135bd9ff068ca9ee5fc11215ff7bbbff248ba3816aa035e
MD5 df54fef815a18d2692187bc7a30d5e6f
BLAKE2b-256 730c5761a7e5bf50c3b0a9db0372892239c50c4b3d8472f975a56180555a653e

See more details on using hashes here.

File details

Details for the file bonsai_claude-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: bonsai_claude-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 16.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.16 {"installer":{"name":"uv","version":"0.9.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for bonsai_claude-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 17698b64f5db83e91985c51c82aa2c5a07c388a00c4bb5daabc2455cf2ff3493
MD5 0b294e0ebadb55ca67d832647306e503
BLAKE2b-256 cebce9d0dadbc8a44daa3dc377248df683f900d31211cb5e4acb3e5bc644aec6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page