CLI for running LLMs on Apple Silicon via MLX

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

wydrox

These details have not been verified by PyPI

Project links

Homepage

Project description

ppmlx

Run LLMs on your Mac. OpenAI-compatible API powered by Apple Silicon.

Python 3.11+ Platform License

Install

uv tool install ppmlx

Requires macOS on Apple Silicon (M1+) and Python 3.11+

Privacy note: ppmlx never sends prompts, responses, file contents, paths, or tokens anywhere. Optional anonymous usage analytics can be disabled with ppmlx config --no-analytics.

Get Started

ppmlx pull qwen3.5:9b      # download a model
ppmlx run qwen3.5:9b       # chat in the terminal
ppmlx serve                 # start API server on :6767

curl | sh (one-liner)

curl -fsSL https://raw.githubusercontent.com/the-focus-company/ppmlx/main/scripts/install.sh | sh

From source

git clone https://github.com/the-focus-company/ppmlx
cd ppmlx
uv tool install .

Homebrew

Homebrew tap coming soon. For now, use uv tool install ppmlx.

Quick Start

# 1. Download a model
ppmlx pull llama3

# 2. Interactive chat REPL
ppmlx run llama3

# 3. Start OpenAI-compatible API server on :6767
ppmlx serve

Benchmarks

Measured on a MacBook Pro M4 Pro (48 GB unified memory, macOS 15.x). Each scenario was run 3 times with temperature=0 and max_tokens=8192; values below are averages.

GLM-4.7-Flash (4-bit, ~5 GB)

Scenario	Metric	ppmlx	Ollama	Delta
Simple (short prompt, short answer)	tok/s	63.1	40.5	+56%
	TTFT	374 ms	832 ms	-55%
Complex (short prompt, long answer)	tok/s	55.6	38.8	+43%
	TTFT	496 ms	412 ms	+20%
Long context (~4 K token prompt)	tok/s	42.1	27.5	+53%
	TTFT	6,792 ms	8,401 ms	-19%

Qwen 3.5 9B (4-bit, ~6 GB)

Scenario	Metric	ppmlx	Ollama	Delta
Simple	tok/s	48.2	22.7	+112%
	TTFT	537 ms	324 ms	+66%
Complex	tok/s	47.2	23.0	+106%
	TTFT	567 ms	455 ms	+25%
Long context	tok/s	43.2	23.7	+82%
	TTFT	9,212 ms	11,461 ms	-20%

tok/s = tokens per second (higher is better). TTFT = time to first token (lower is better). Delta is relative to Ollama.

Methodology. Streaming chat completions over the OpenAI-compatible API; TTFT measured from request start to first SSE content chunk. See scripts/bench_common.sh and the per-model scripts in scripts/ for the full, reproducible setup.

That's it. Any OpenAI-compatible tool works out of the box:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:6767/v1", api_key="local")
response = client.chat.completions.create(
    model="qwen3.5:9b",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

Commands

Command	Description	Key Options
`ppmlx launch`	Interactive launcher (pick action + model)	`-m model`, `--host`, `--port`, `--flush`
`ppmlx serve`	Start API server on :6767	`-m model`, `--embed-model`, `-i`, `--no-cors`
`ppmlx run <model>`	Interactive chat REPL	`-s system`, `-t temp`, `--max-tokens`
`ppmlx pull [model]`	Download model (multiselect if no arg)	`--token`
`ppmlx list`	Show downloaded models	`-a` all (incl. registry), `--path`
`ppmlx rm <model>`	Remove a model	`-f` skip confirmation
`ppmlx ps`	Show loaded models & memory
`ppmlx quantize <model>`	Convert & quantize HF model to MLX	`-b bits`, `--group-size`, `-o output`
`ppmlx config`	View/set configuration	`--hf-token`

Connect Your Tools

Point any OpenAI-compatible client at http://localhost:6767/v1 with any API key:

Cursor — Settings > AI > OpenAI-compatible
Continue — config.json: provider openai, apiBase above
LangChain / LlamaIndex — set base_url and api_key="local"

Config

Optional. ~/.ppmlx/config.toml:

[server]
host = "127.0.0.1"
port = 6767

[defaults]
temperature = 0.7
max_tokens = 2048

[analytics]
enabled = true
provider = "posthog"
respect_do_not_track = true

Anonymous Usage Analytics

ppmlx supports privacy-preserving anonymous product analytics, disabled by default. On first interactive run, the beta onboarding asks whether you want to help by enabling it.

What is sent:

command and API event names such as serve_started, model_pulled, api_chat_completions
app version, Python minor version, OS family, CPU architecture
a random anonymous install id, used only to count returning beta installs
coarse booleans/counters such as stream=true, tools=true, batch_size=4

What is never sent:

prompts, responses, tool arguments, file contents, file paths
HuggingFace tokens, API keys, repo IDs, model prompts, request bodies

When events are sent:

when a CLI command starts
when OpenAI-compatible API endpoints are hit

Why:

understand which workflows matter most during beta
prioritize compatibility work across commands and API surfaces
measure adoption without collecting user content

Opt out:

ppmlx config --no-analytics

or:

[analytics]
enabled = false

By default, opted-in beta analytics are sent to the maintainer-operated PostHog project. To use your own PostHog sink instead, configure:

export PPMLX_ANALYTICS_HOST="https://analytics.example.com"
export PPMLX_ANALYTICS_PROJECT_API_KEY="your-posthog-project-api-key"

If you prefer, you can also set the same values in ~/.ppmlx/config.toml.

API Documentation

When the server is running, interactive API docs are available at:

Swagger UI: http://localhost:6767/docs
ReDoc: http://localhost:6767/redoc

Requirements

macOS on Apple Silicon (M1 or later)
Python 3.11+
At least 8 GB unified memory (16 GB+ recommended for larger models)

ppmlx vs Ollama

	ppmlx	Ollama
Runtime	MLX (Apple-native)	llama.cpp (cross-platform)
Platform	macOS Apple Silicon only	macOS, Linux, Windows
GPU backend	Metal (unified memory)	Metal / CUDA / ROCm
API	OpenAI-compatible	Ollama + OpenAI-compatible
Language	Python	Go + C++
Quantization	MLX format	GGUF format

Choose ppmlx if you want maximum Apple Silicon performance with a pure-Python, MLX-native stack. Choose Ollama if you need cross-platform support or GGUF models.

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

wydrox

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.5.2

May 7, 2026

This version

0.5.1

May 7, 2026

0.5.0

May 7, 2026

0.3.0

Mar 29, 2026

0.1.0

Mar 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ppmlx-0.5.1.tar.gz (104.5 kB view details)

Uploaded May 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ppmlx-0.5.1-py3-none-any.whl (93.1 kB view details)

Uploaded May 7, 2026 Python 3

File details

Details for the file ppmlx-0.5.1.tar.gz.

File metadata

Download URL: ppmlx-0.5.1.tar.gz
Upload date: May 7, 2026
Size: 104.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ppmlx-0.5.1.tar.gz
Algorithm	Hash digest
SHA256	`f4bc44ffe779a58a3b90b9cff4be890c5b2be6e711a58423d9be86b66c2d435e`
MD5	`8f1a9d32920e291e3a02be03c6492f75`
BLAKE2b-256	`abde6edd10093407cbe24576fcaddb58507c60cd87e8abadd63dc908d253eca6`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ppmlx-0.5.1.tar.gz:

Publisher: release.yml on the-focus-company/ppmlx

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ppmlx-0.5.1.tar.gz
- Subject digest: f4bc44ffe779a58a3b90b9cff4be890c5b2be6e711a58423d9be86b66c2d435e
- Sigstore transparency entry: 1461424848
- Sigstore integration time: May 7, 2026
Source repository:
- Permalink: the-focus-company/ppmlx@cf82f33688315faa95ddeb31932252018ff711db
- Branch / Tag: refs/tags/v0.5.1
- Owner: https://github.com/the-focus-company
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@cf82f33688315faa95ddeb31932252018ff711db
- Trigger Event: push

File details

Details for the file ppmlx-0.5.1-py3-none-any.whl.

File metadata

Download URL: ppmlx-0.5.1-py3-none-any.whl
Upload date: May 7, 2026
Size: 93.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ppmlx-0.5.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b0019fb77f88058a8667c25128a5a12348613db6ff6ca7c07f69921aea554c87`
MD5	`5c29343b31080cb4681c334513b3f66a`
BLAKE2b-256	`7394aa3fc32920fc38e9843661ef63bddd07aff25dea90bcaf51352198ce8114`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ppmlx-0.5.1-py3-none-any.whl:

Publisher: release.yml on the-focus-company/ppmlx

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ppmlx-0.5.1-py3-none-any.whl
- Subject digest: b0019fb77f88058a8667c25128a5a12348613db6ff6ca7c07f69921aea554c87
- Sigstore transparency entry: 1461425081
- Sigstore integration time: May 7, 2026
Source repository:
- Permalink: the-focus-company/ppmlx@cf82f33688315faa95ddeb31932252018ff711db
- Branch / Tag: refs/tags/v0.5.1
- Owner: https://github.com/the-focus-company
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@cf82f33688315faa95ddeb31932252018ff711db
- Trigger Event: push

ppmlx 0.5.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ppmlx

Install

Get Started

curl | sh (one-liner)

From source

Homebrew

Quick Start

Benchmarks

GLM-4.7-Flash (4-bit, ~5 GB)

Qwen 3.5 9B (4-bit, ~6 GB)

Commands

Connect Your Tools

Config

Anonymous Usage Analytics

API Documentation

Requirements

ppmlx vs Ollama

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance