Anthropic-compatible local LLM server for Claude Code, backed by mlx-lm / mlx-vlm on Apple Silicon.

These details have not been verified by PyPI

Project description

tether

Local Anthropic-compatible LLM server for Claude Code, backed by mlx-lm on Apple Silicon.

One command launches the proxy and opens Claude Code in the same terminal — inspired by ollama launch claude.

Requirements

macOS on Apple Silicon
Python 3.12+
claude on PATH

Install

From PyPI (the distribution is named tetherd — the name tether was already taken — but it still installs the tether command):

pipx install tetherd         # recommended — isolated venv, `tether` on PATH
# or
pip install tetherd

From GitHub (tracks main):

pipx install "git+https://github.com/ryank1m/tether.git"

For development (editable checkout):

git clone https://github.com/ryank1m/tether.git
cd tether
python3.12 -m venv .venv
.venv/bin/pip install -e ".[dev]"

All forms expose the tether console script and pull in mlx-vlm and litellm as dependencies.

Getting a model

tether never downloads weights itself. Place a model directory anywhere under ~/.tether/models/ (any folder containing a config.json is auto-discovered) and pass --model <name> or pick it interactively:

# Option A: download from HuggingFace (needs network)
huggingface-cli download mlx-community/gemma-4-e2b-it-4bit \
  --local-dir ~/.tether/models/gemma-4-e2b-it-4bit

# Option B: copy an already-downloaded model directory
#          into ~/.tether/models/ by any means you like
#          (scp, rsync, USB drive, sneakernet — tether only
#           cares that config.json is present).

Usage

Launch Claude Code against a local MLX model — this is the default path, no extra terminals or env vars needed:

tether --model mlx-community/gemma-3-4b-it-4bit

Under the hood tether loads the model, brings up a LiteLLM Anthropic-compatible proxy on 127.0.0.1:8080, waits for it to be ready, and then spawns claude as a child process with ANTHROPIC_BASE_URL, ANTHROPIC_DEFAULT_{OPUS,SONNET,HAIKU}_MODEL, CLAUDE_CODE_SUBAGENT_MODEL, and friends wired up automatically. Ctrl-C exits Claude Code and cleans up the proxy.

Drop models into ~/.tether/models/ and pick one interactively:

tether
# ↑/↓ or j/k · Enter to select · q/Esc to cancel

Local models (no HuggingFace involvement)

Passing an HF repo id (mlx-community/gemma-4-e2b-it-4bit) causes mlx_vlm.load to revalidate the cached files against HF on every start. To take HuggingFace out of the loop entirely, download once and pass the directory directly:

huggingface-cli download mlx-community/gemma-4-e2b-it-4bit \
  --local-dir ~/.tether/models/gemma-4-e2b-it-4bit
tether --model ~/.tether/models/gemma-4-e2b-it-4bit

The TUI picker auto-discovers any subdirectory of ~/.tether/models/ that contains a config.json, so a model you download there shows up automatically when you run tether with no --model.

Forward arguments to claude with a trailing --:

tether --model mlx-community/gemma-3-4b-it-4bit -- --resume

List local models and exit:

tether --list

Config directory

tether reads every *.toml file in ~/.tether/config/ at startup. On first run it creates ~/.tether/config/config.toml with sensible defaults (including a 64k-token context cap) and a commented template for every field. It will never overwrite that file again — edit it freely.

~/.tether/config/
├── config.toml         # auto-created, always loaded first
├── 10-m4pro.toml       # optional drop-in (per-machine tweaks)
└── 20-work.toml        # optional drop-in (project-specific)

All files are merged on top of config.toml in alphabetical order by filename. The NN-name.toml numeric-prefix convention (same idea as systemd drop-ins) is the recommended way to control ordering. Dotfiles, non-.toml files, and subdirectories are ignored.

Precedence (highest → lowest):

CLI flags (--max-context 4096, --memory-cap 18GB, …)
Alphabetically-latest drop-in
Earlier drop-ins
config.toml
Built-in defaults

Every option can also be set from the command line for one-off overrides — see tether --help.

Example drop-in for a 24 GB M4 Pro running the 26B MoE variant:

# ~/.tether/config/10-m4pro.toml
[model]
default = "gemma-4-26b-a4b-it-4bit"
memory_cap = "18GiB"
max_context = 32000

Flags

flag	default	purpose
`--model PATH_OR_REPO`	—	explicit model; skips the picker
`--host`	`127.0.0.1`	bind host for the proxy
`--port`	`8080`	bind port for the proxy
`--models-dir`	`~/.tether/models`	override discovery dir
`--list`	—	list local models and exit
`--serve-only` / `--no-claude`	—	run only the proxy, do not launch claude
`--claude-path`	`which claude`	explicit path to the claude binary
`--memory-cap`	from config	hard MLX memory ceiling, e.g. `18GB`, `20GiB`
`--max-context`	from config (64k)	truncate history so prompt stays under N tokens
`--log-level`	`warning`	uvicorn log level
`-- …`	—	everything after `--` is forwarded to `claude`

Advanced: standalone proxy

If you want to point a non–Claude Code client at the proxy, or run it under systemd / launchd, use --serve-only:

tether --serve-only --model mlx-community/gemma-3-4b-it-4bit

Then point any Anthropic-Messages-compatible client at http://127.0.0.1:8080. Quick curl check:

curl -s -X POST http://127.0.0.1:8080/v1/messages \
  -H 'Content-Type: application/json' \
  -H 'anthropic-version: 2023-06-01' \
  -d '{"model":"mlx-local","max_tokens":50,
       "messages":[{"role":"user","content":"Say pineapple."}]}'

The proxy registers a wildcard route, so any model name in the request body routes to the loaded MLX model.

Known limitations (v0.1)

Token usage counts are zero. mlx_vlm.generate is not yet wired up to return prompt/completion token counts through to the response.
Single session. One shared prompt cache; two concurrent Claude Code sessions with different system prompts will thrash the cache. Fine for a single-user local server.
No auth / TLS. Bind is 127.0.0.1 only.
Apple Silicon + macOS only.

Architecture

See plan/plan-option-b.md (proxy + custom provider), plan/plan-unified-launch.md (single-terminal launch), and plan/plan-tool-use.md (Gemma 4 tool-call wiring). Request flow:

tether
   ├── uvicorn (daemon thread)  ──► LiteLLM ──► MLXProvider ──► mlx_lm
   └── subprocess: claude (foreground, owns TTY)
                      │
                      └── POST /v1/messages → proxy thread

Everything lives in one process tree. The model is loaded once at startup (eagerly, so load failures surface before Claude Code launches) and reused across every subsequent call.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Apr 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tetherd-0.1.0.tar.gz (59.0 kB view details)

Uploaded Apr 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tetherd-0.1.0-py3-none-any.whl (33.2 kB view details)

Uploaded Apr 12, 2026 Python 3

File details

Details for the file tetherd-0.1.0.tar.gz.

File metadata

Download URL: tetherd-0.1.0.tar.gz
Upload date: Apr 12, 2026
Size: 59.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for tetherd-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`9f56492d0f3003b0ba7e49390b58eba58db23a1f9b338cb95c51b08d36c9e942`
MD5	`7af41af8cb45930ec25782e2addf1775`
BLAKE2b-256	`a56f58edd661393fcb7b0ea0999a2657e51817b72ffa43e612a6ed68829da238`

See more details on using hashes here.

File details

Details for the file tetherd-0.1.0-py3-none-any.whl.

File metadata

Download URL: tetherd-0.1.0-py3-none-any.whl
Upload date: Apr 12, 2026
Size: 33.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for tetherd-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`04aacb92be20f93131713c6e1f757133800f4927a6eeb3427739935b33fe3e62`
MD5	`e392ebd08cf7fb86db81674322557b1d`
BLAKE2b-256	`1574e7b8026750897b4282e6a4567623bb9ae3e364aade11140e912a489b6497`

See more details on using hashes here.

tetherd 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

tether

Requirements

Install

Getting a model

Usage

Local models (no HuggingFace involvement)

Config directory

Flags

Advanced: standalone proxy

Known limitations (v0.1)

Architecture

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes