CLI for running LLMs on Apple Silicon via MLX

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

wydrox

These details have not been verified by PyPI

Project links

Homepage

Project description

ppmlx

CLI for running LLMs on Apple Silicon via MLX — OpenAI-compatible API on port 6767.

Python 3.11+ Platform License

Install

Requires: macOS on Apple Silicon (M1/M2/M3/M4), Python 3.11+

uv (recommended)

uv tool install ppmlx

pipx

pipx install ppmlx

curl | sh (one-liner)

curl -fsSL https://raw.githubusercontent.com/PingCompany/ppmlx/main/scripts/install.sh | sh

From source

git clone https://github.com/PingCompany/ppmlx
cd ppmlx
uv tool install .

Homebrew

Homebrew tap coming soon. For now, use uv tool install ppmlx.

Quick Start

# 1. Download a model
ppmlx pull llama3

# 2. Interactive chat REPL
ppmlx run llama3

# 3. Start OpenAI-compatible API server on :6767
ppmlx serve

OpenAI SDK Example

from openai import OpenAI

client = OpenAI(base_url="http://localhost:6767/v1", api_key="local")

response = client.chat.completions.create(
    model="llama3",
    messages=[{"role": "user", "content": "Hello, how are you?"}],
    stream=True,
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
print()

curl Example

# List available models
curl http://localhost:6767/v1/models

# Chat completion
curl http://localhost:6767/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3",
    "messages": [{"role": "user", "content": "What is Apple Silicon?"}],
    "stream": false
  }'

# Embeddings
curl http://localhost:6767/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{"model": "nomic-embed", "input": "Hello world"}'

Model Aliases

Llama Family

Alias	HuggingFace Repo
`llama3`	mlx-community/Meta-Llama-3-8B-Instruct-4bit
`llama3-70b`	mlx-community/Meta-Llama-3-70B-Instruct-4bit
`llama3.2`	mlx-community/Llama-3.2-3B-Instruct-4bit
`llama3.1`	mlx-community/Meta-Llama-3.1-8B-Instruct-4bit

Mistral / Mixtral Family

Alias	HuggingFace Repo
`mistral`	mlx-community/Mistral-7B-Instruct-v0.3-4bit
`mixtral`	mlx-community/Mixtral-8x7B-Instruct-v0.1-4bit
`mistral-nemo`	mlx-community/Mistral-Nemo-Instruct-2407-4bit

Qwen Family

Alias	HuggingFace Repo
`qwen2.5`	mlx-community/Qwen2.5-7B-Instruct-4bit
`qwen2.5-14b`	mlx-community/Qwen2.5-14B-Instruct-4bit
`qwen2.5-72b`	mlx-community/Qwen2.5-72B-Instruct-4bit

Phi / Gemma Family

Alias	HuggingFace Repo
`phi4`	mlx-community/phi-4-4bit
`phi3.5`	mlx-community/Phi-3.5-mini-instruct-4bit
`gemma2`	mlx-community/gemma-2-9b-it-4bit
`gemma2-27b`	mlx-community/gemma-2-27b-it-4bit

Code Models

Alias	HuggingFace Repo
`codellama`	mlx-community/CodeLlama-13b-Instruct-hf-4bit
`deepseek-coder`	mlx-community/deepseek-coder-6.7b-instruct-4bit

Embedding Models

Alias	HuggingFace Repo
`nomic-embed`	mlx-community/nomic-embed-text-v1.5
`bge-small`	mlx-community/bge-small-en-v1.5

RAM Requirements

Model Size	Min RAM	Recommended RAM	Notes
1-3B params	4 GB	8 GB	Llama 3.2 3B, Phi 3.5 mini
7-8B params	8 GB	16 GB	Llama 3 8B, Mistral 7B
13-14B	16 GB	24 GB	CodeLlama 13B, Qwen 2.5 14B
27-34B	24 GB	36 GB	Gemma 2 27B
70-72B	48 GB	64 GB	Llama 3 70B, Qwen 2.5 72B

All values are for 4-bit quantized models. Unquantized models require 2-4x more RAM.

CLI Commands

Command	Description
`ppmlx pull <model>`	Download a model from HuggingFace Hub
`ppmlx run <model>`	Start interactive chat REPL
`ppmlx serve`	Start OpenAI-compatible API server on :6767
`ppmlx list`	List locally downloaded models
`ppmlx rm <model>`	Remove a downloaded model
`ppmlx alias <n> <repo>`	Add a custom model alias
`ppmlx aliases`	Show all model aliases (built-in + custom)
`ppmlx ps`	Show currently loaded models and memory usage
`ppmlx quantize`	Convert and quantize a model to MLX format
`ppmlx create`	Create a custom model from a Modelfile
`ppmlx logs`	Query the request log database
`ppmlx info <model>`	Show detailed model information
`ppmlx estimate <m>`	Estimate RAM requirements before downloading

Modelfile Example

Create a Modelfile to define a custom model with a system prompt:

FROM llama3

SYSTEM """
You are a helpful coding assistant. You write clean, well-documented code
and explain your reasoning step by step.
"""

PARAMETER temperature 0.2
PARAMETER max_tokens 4096
PARAMETER top_p 0.9

Then build it:

ppmlx create coding-assistant -f Modelfile
ppmlx run coding-assistant

Configuration

ppmlx reads configuration from ~/.ppmlx/config.toml. All values are optional.

[server]
host = "127.0.0.1"      # Bind address (default: 127.0.0.1)
port = 6767             # Port (default: 6767)
cors = true             # Enable CORS (default: true)
cors_origins = ["*"]    # Allowed CORS origins

[models]
dir = "~/.ppmlx/models"   # Model storage directory
default_alias = "llama3"    # Default model for bare requests

[generation]
temperature = 0.7       # Default sampling temperature
max_tokens = 2048       # Default max output tokens
top_p = 0.9             # Default top-p
repetition_penalty = 1.1

[logging]
db_path = "~/.ppmlx/ppmlx.db"   # SQLite log database
log_requests = true                 # Log all requests
log_level = "info"                  # Server log level

Architecture

┌─────────────────────────────────────────────────────┐
│                    ppmlx CLI                       │
│  (typer + rich)                                     │
│  pull / run / serve / list / rm / quantize / ...    │
└───────────────────┬─────────────────────────────────┘
                    │
         ┌──────────▼──────────┐
         │   FastAPI Server    │
         │   port :6767        │
         │                     │
         │  /v1/chat/completions│
         │  /v1/completions    │
         │  /v1/embeddings     │
         │  /v1/models         │
         │  /health /metrics   │
         └──────┬──────┬───────┘
                │      │
      ┌─────────▼──┐ ┌─▼──────────────┐
      │  LLM Engine│ │  Embed Engine  │
      │  (mlx-lm)  │ │(mlx-embeddings)│
      └─────────┬──┘ └─┬──────────────┘
                │       │
      ┌─────────▼───────▼──────────────┐
      │       MLX / Metal GPU          │
      │   Apple Silicon Unified Memory │
      └────────────────────────────────┘
                    │
      ┌─────────────▼──────────────────┐
      │   SQLite Request Log           │
      │   ~/.ppmlx/ppmlx.db          │
      └────────────────────────────────┘

Uninstall

uv

uv tool uninstall ppmlx

pipx

pipx uninstall ppmlx

Manual cleanup (all methods)

# Remove downloaded models and config
rm -rf ~/.ppmlx

Contributing

Fork the repository on GitHub.
Create a feature branch: git checkout -b feat/my-feature
Install dev dependencies: uv sync --python 3.11
Run tests: uv run pytest tests/ -v
Submit a pull request.

Development Setup

git clone https://github.com/PingCompany/ppmlx
cd ppmlx
uv sync --python 3.11
uv run ppmlx --version
uv run pytest tests/ -v

Project Structure

ppmlx/
  __init__.py       # version
  cli.py            # Typer CLI entry point
  server.py         # FastAPI application
  engine.py         # MLX LLM inference engine
  engine_embed.py   # MLX embedding engine
  engine_vlm.py     # MLX vision-language engine
  models.py         # model registry, aliases, download
  config.py         # configuration loading
  db.py             # SQLite request logging
  memory.py         # RAM estimation utilities
  modelfile.py      # Modelfile parser
  quantize.py       # MLX quantization helpers
tests/
  conftest.py       # MLX stubs for CI
  test_cli.py       # CLI tests
scripts/
  install.sh        # One-liner installer
homebrew/
  Formula/ppmlx.rb # Homebrew formula
.github/workflows/
  tests.yml         # CI tests
  release.yml       # PyPI release
  homebrew-update.yml

License

MIT — see LICENSE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

wydrox

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.5.2

May 7, 2026

0.5.1

May 7, 2026

0.5.0

May 7, 2026

0.3.0

Mar 29, 2026

This version

0.1.0

Mar 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ppmlx-0.1.0.tar.gz (89.5 kB view details)

Uploaded Mar 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ppmlx-0.1.0-py3-none-any.whl (67.8 kB view details)

Uploaded Mar 25, 2026 Python 3

File details

Details for the file ppmlx-0.1.0.tar.gz.

File metadata

Download URL: ppmlx-0.1.0.tar.gz
Upload date: Mar 25, 2026
Size: 89.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ppmlx-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`22f94d51c01930f8f2dd865bca022cbdb7711afc70fdc87e663b61121edbecff`
MD5	`61b5071100e1a32759924aae644432d8`
BLAKE2b-256	`ee0a433431922f521f2bab089399880f59768cc3b1ecc4d53848be287bfdb26b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ppmlx-0.1.0.tar.gz:

Publisher: release.yml on PingCompany/ppmlx

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ppmlx-0.1.0.tar.gz
- Subject digest: 22f94d51c01930f8f2dd865bca022cbdb7711afc70fdc87e663b61121edbecff
- Sigstore transparency entry: 1182365019
- Sigstore integration time: Mar 25, 2026
Source repository:
- Permalink: PingCompany/ppmlx@68331d5cee18fed2fd30b36cfb4210489f690c6a
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/PingCompany
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@68331d5cee18fed2fd30b36cfb4210489f690c6a
- Trigger Event: push

File details

Details for the file ppmlx-0.1.0-py3-none-any.whl.

File metadata

Download URL: ppmlx-0.1.0-py3-none-any.whl
Upload date: Mar 25, 2026
Size: 67.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ppmlx-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dfb4ff1c928571976fbe4688d541d4b069dba8d2023dfe8554c7ee0736b50910`
MD5	`07876a06c112ea92f73a68d8825ebc90`
BLAKE2b-256	`3697ce9040e24d7c7b78da40763c7192ba41d9d78211fb3a712588baa45beb3d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ppmlx-0.1.0-py3-none-any.whl:

Publisher: release.yml on PingCompany/ppmlx

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ppmlx-0.1.0-py3-none-any.whl
- Subject digest: dfb4ff1c928571976fbe4688d541d4b069dba8d2023dfe8554c7ee0736b50910
- Sigstore transparency entry: 1182365021
- Sigstore integration time: Mar 25, 2026
Source repository:
- Permalink: PingCompany/ppmlx@68331d5cee18fed2fd30b36cfb4210489f690c6a
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/PingCompany
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@68331d5cee18fed2fd30b36cfb4210489f690c6a
- Trigger Event: push

ppmlx 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ppmlx

Install

uv (recommended)

pipx

curl | sh (one-liner)

From source

Homebrew

Quick Start

OpenAI SDK Example

curl Example

Model Aliases

Llama Family

Mistral / Mixtral Family

Qwen Family

Phi / Gemma Family

Code Models

Embedding Models

RAM Requirements

CLI Commands

Modelfile Example

Configuration

Architecture

Uninstall

uv

pipx

Manual cleanup (all methods)

Contributing

Development Setup

Project Structure

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance