Access Apple's on-device Foundation Models via CLI and OpenAI-compatible API

These details have not been verified by PyPI

Project links

Project description

If you find this useful, please ⭐ the repo! Also check out Vesta AI Explorer! — my full-featured native macOS AI app.

Install

	Stable (v0.9.12)	Nightly (afm-next)
Homebrew	`brew install scouzi1966/afm/afm`	`brew install scouzi1966/afm/afm-next`
pip	`pip install macafm`	`pip install --extra-index-url https://maclocal-ai.pages.dev/afm/wheels/simple/ macafm-next`
Release notes	v0.9.12	v0.9.13-next

🔨 Build from source — one command

Clone and build everything (submodules, patches, WebUI, release binary) with a single script. It checks your toolchain, auto-installs what it can (Node via Homebrew), and tells you what to install manually (Xcode Command Line Tools) — no AI agent or project knowledge required. The script initializes submodules for you, so a plain git clone is all you need.

git clone https://github.com/scouzi1966/maclocal-api.git
cd maclocal-api
./build.sh

That's it. The afm binary lands in .build/release/afm. Add --install to also install it to /usr/local/bin (on your PATH by default; uses sudo if needed):

./build.sh --install

Run ./build.sh --help for all options (--debug, --skip-webui, --yes for non-interactive/CI).

[!TIP] Switching between stable and nightly:

brew unlink afm && brew install scouzi1966/afm/afm-next   # switch to nightly
brew unlink afm-next && brew link afm                      # switch back to stable
ASSUMES you did a brew install scouzi1966/afm/afm previously

Install a previous version

Older stable releases are kept as pinned formulae in the Homebrew tap and as version-pinned wheels on PyPI. Useful for reproducing an issue against a specific build or rolling back without waiting for a new release.

Homebrew (pinned stable formulae): afm@<version> — available for 0.9.0, 0.9.1, 0.9.3–0.9.10.

brew install scouzi1966/afm/afm@0.9.10      # install v0.9.10
brew uninstall afm                          # if current afm is already installed
brew link afm@0.9.10                        # expose `afm` on PATH
afm --version                               # → v0.9.10

Homebrew (pinned nightly formulae): afm-next@<full-version> — e.g. afm-next@0.9.11-next.9c3225e.20260418. Lists of available pinned nightlies are at github.com/scouzi1966/homebrew-afm.

brew install scouzi1966/afm/afm-next@0.9.11-next.9c3225e.20260418

pip (version-pinned wheels): any published release.

pip install macafm==0.9.10                  # previous stable
pip install --extra-index-url https://maclocal-ai.pages.dev/afm/wheels/simple/ \
  macafm-next==0.9.13.dev20260621           # pinned nightly

[!NOTE]

31 Mar, 2026. AFM was pinned to an older version of https://github.com/huggingface/swift-huggingface. I have now pinned to the latest which uses hub for model cache. The older version downloaded models to the ~/Documents/Huggingface folder which was causing some pain with iCloud sync. They are now stored under ~/.cache which is not in iCloud scope. the TLDR is that models will be re-downloaded again. You can manually delete the older models located in ~/Documents/Huggingface to regain some valuable space available (spring cleaning!). Please report any issues.

Attention M-series Mac AI enthusiasts! You don't need to be a Swift developer to explore. Vibe coding really allows anyone to participate in this project. A lot of the hype is real! It does work.

Fork this repo first, then clone your fork to submit PRs:
git clone https://github.com/<your-username>/maclocal-api.git   
cd maclocal-api
claude
/build-afm
To just experiment locally
git clone https://github.com/scouzi1966/maclocal-api.git   
cd maclocal-api
claude
/build-afm
/build-afm is an AI skill that builds for the first time so that you can start coding

Start vibe coding! I will add support for skills with more coding agents in the future.

afm — Run Any MLX LLM on Your Mac, 100% Local

Extensive testing of Qwen3.5-35B-A3B with afm. Uses an experimental technique with Claude and Codex as judges for evaluation scoring. Click the link below to view test results.

afm-next Nightly Test Report — Qwen3.5-35B-A3B Focus

Run open-source MLX models or Apple's on-device Foundation Model through an OpenAI-compatible API. Built entirely in Swift for maximum Metal GPU performance. No Python runtime, no cloud, no API keys.

What's new in afm-next

[!IMPORTANT] The nightly build is the future stable release. It includes everything in v0.9.12 plus:

⚡ Speculative decoding — up to +52% faster decode, quality-preserving. Two model-specific options:

--mtp for Qwen3.6-27B (self-speculative MTP head) → ~+52%

--eagle3 <drafter> for dense Gemma4-31B (EAGLE3 drafter) → ~+30%

Both work for streaming and non-streaming and preserve greedy-decode quality (bit-exact on short generations; may differ token-for-token on longer ones). See ⚡ Speculative Decoding below.

Faster long context — backported adaptive-block SDPA (~+10% decode @16k), eager <think>-tag streaming (reasoning TTFT ~610ms→~346ms), and Metal-kernel prewarm for a faster cold first token.

Swift 6 language mode migration.

[!TIP] 🙏 Huge thanks to @jesserobbins — first-time contributor, landed two substantial features in this cycle (Vision OCR + Speech transcription). Both PRs brought afm's Apple-native capabilities from the CLI into first-class HTTP APIs. Contributions of this size and quality from a new contributor are rare and appreciated.

Quick Start

# Run any MLX model with WebUI
afm mlx -m mlx-community/Qwen3.5-35B-A3B-4bit -w

# Or any smaller model
afm mlx -m mlx-community/gemma-3-4b-it-8bit -w

# Chat from the terminal (auto-downloads from Hugging Face)
afm mlx -m Qwen3-0.6B-4bit -s "Explain quantum computing"

# Interactive model picker (lists your downloaded models)
MACAFM_MLX_MODEL_CACHE=/path/to/models afm mlx -w

# Apple's on-device Foundation Model with WebUI
afm -w

⚡ Speculative Decoding

afm can decode up to +52% faster while preserving greedy-decode quality — output is bit-exact to normal greedy decoding on short generations and stays greedy-quality on longer ones (it may differ token-for-token there). There are two options, one per model family. Each needs a specific checkpoint/drafter (a plain 4-bit conversion won't work):

Running…	Flag	Speedup	Get the model (Hugging Face)
Qwen3.6-27B	`--mtp`	~+52%	`Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed` — ships the `mtp.safetensors` head¹
Gemma4-31B (dense)	`--eagle3 <drafter-dir>`	~+30%	verifier `mlx-community/gemma-4-31b-it-4bit` + drafter `RedHatAI/gemma-4-31B-it-speculator.eagle3`

¹ The plain mlx-community/Qwen3.6-27B-4bit conversion strips the MTP head, so --mtp silently no-ops there — you must use the checkpoint above.

[!NOTE] Both fast paths engage only for greedy (temperature: 0), text-only requests (streaming or non-streaming). Anything with tools / response_format / logprobs / stop, or --concurrent N≥2, silently falls back to normal autoregressive decode — output is always correct either way.

1. Qwen3.6-27B — MTP (`--mtp`)

Self-speculative decoding using Qwen3.6's in-model MTP head — no separate draft model needed.

# afm auto-downloads the model from Hugging Face on first run
afm mlx -m Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed --mtp --port 9999

# Then call it like any OpenAI endpoint (greedy → MTP fast path engages)
curl -s http://127.0.0.1:9999/v1/chat/completions -H 'Content-Type: application/json' -d '{
  "model": "Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed",
  "messages": [{"role":"user","content":"Explain how a CPU cache works in 4 sentences."}],
  "temperature": 0, "max_tokens": 200
}'

2. Gemma4-31B dense — EAGLE3 (`--eagle3 <drafter-dir>`)

Speculative decoding for the dense Gemma4-31B verifier using an EAGLE3 drafter. Pass the drafter as a local directory (download it first):

# 1) download the EAGLE3 drafter from Hugging Face
huggingface-cli download RedHatAI/gemma-4-31B-it-speculator.eagle3 \
  --local-dir ~/models/gemma-4-31B-eagle3

# 2) run the dense verifier with the drafter
afm mlx -m mlx-community/gemma-4-31b-it-4bit \
  --eagle3 ~/models/gemma-4-31B-eagle3 --port 9999

Tuning: drafts-per-round defaults to 2 (the sweet spot); override with AFM_EAGLE3_BLOCK=3. The MoE Gemma4 (26B-A4B) is not accelerated by spec-decode — --eagle3 only helps the dense 31B.

📖 Full usage, tuning, debugging/profiling flags, and benchmarks: docs/decode-optimizations.md

Why AFM for agents

afm is built for agentic clients — OpenCode, OpenClaw, Cline, Continue.dev, Aider, Cursor, Hermes — that drive multi-turn tool-using LLM loops against a local OpenAI-compatible endpoint. The capabilities below are already in the box:

Capability	What it gets you	Where it lives
7+ tool-call formats, auto-detected	json, lfm2, xmlFunction (Qwen3-Coder), glm4, gemma, kimiK2, minimaxM2 picked from `model_type` in `config.json` — no per-model tuning	`MLXModelService.swift:inferToolCallFormat`
`afm_adaptive_xml` parser	JSON-in-XML fallback, type coercion, nullable schema flatten, fuzzy tool-name match — survives the malformed XML real models emit	`Models/ToolCallStreamingRuntime.swift`
`tool_choice`: auto / none / required / named function	Standard OpenAI semantics; named-function forcing routed end-to-end	`Models/OpenAIRequest.swift:ToolChoice`
Streaming tool-call deltas	Token-level start/end tag detection; content outside tool calls streams normally	`Controllers/MLXChatCompletionsController.swift`
`<think>` + harmony channel reasoning extraction	Routes Qwen/DeepSeek `<think>…</think>` and gpt-oss `<\|channel\|>analysis…` into `reasoning_content` so the WebUI/agent can show it separately	`Controllers/MLXChatCompletionsController.swift:extractThinkTags / extractHarmonyChannels`
Strict `json_schema` + xgrammar EBNF	Guaranteed-valid JSON via token-level grammar enforcement when `--enable-grammar-constraints` is on	`Models/XGrammarService.swift`
`--guided-json` server default	One CLI flag pins a schema across every chat request that omits its own `response_format` (Foundation + MLX backends)	`Sources/MacLocalAPI/main.swift`
Deterministic `seed`, `logprobs`, `top_logprobs`	All sampling controls (temperature, top_p, top_k, min_p, repetition_penalty, presence_penalty, seed, logprobs+top_logprobs up to 20) plumbed end-to-end	`Models/OpenAIRequest.swift` + `Scripts/patches/Evaluate.swift`
Radix-tree prefix KV cache	`--enable-prefix-caching` reuses KV across turns — agent loops with stable system prompts get prefill for free	`Models/RadixTreeCache.swift`
4/8-bit KV quantization	`--kv-bits 4	8` cuts memory ~2-4× on long-context turns
Concurrent batch decode	`--concurrent N` runs N requests through one model with fair queueing; vLLM-style metrics expose queue depth	`Models/BatchScheduler.swift`
vLLM-namespaced Prometheus `/metrics`	`afm:max_concurrent_slots`, `afm:num_requests_running`, `afm:num_requests_waiting`, plus per-request token/timing histograms	`Controllers/MetricsController.swift`
`Retry-After: 2` on 503	Tells well-behaved agents (LangChain, OpenAI SDK) when to retry — no thundering herd	`Controllers/MLXChatCompletionsController.swift`
Multi-backend gateway mode	`--gateway` discovers Ollama / LM Studio / Jan on the same machine and proxies them under one OpenAI surface, normalizing `reasoning` → `reasoning_content`	`Models/BackendDiscoveryService.swift` + `BackendProxyService.swift`
`X-Request-ID` / `OpenAI-Request-ID` echo	Inbound IDs are honored; otherwise minted as `req_<uuid12>`. Echoed on every response and inside `error.request_id` for retry correlation	`Server.swift:RequestIDMiddleware`
`stream_options.include_usage` honored	Suppress the final usage chunk when the client doesn't want it (matches OpenAI strict mode)	`Models/OpenAIRequest.swift:StreamOptions`
`parallel_tool_calls: false` honored	Truncate to a single tool call per turn for agents that want serial execution	`Controllers/MLXChatCompletionsController.swift:finalizeAssistantTurn`
Speech (transcribe + TTS) and Vision OCR	`/v1/audio/transcriptions`, `/v1/audio/speech`, `/v1/ocr` — agents can hand off audio/image inputs without a separate service	`Controllers/SpeechAPIController.swift`, `VisionAPIController.swift`
On-device embeddings for RAG	`/v1/embeddings` from Apple's NaturalLanguage model — OpenAI-compatible vectors for retrieval/semantic search. Runs as a dedicated `afm embed` server (:9998), separate from the chat endpoint	`Controllers/EmbeddingsController.swift`
Per-client config generators	`afm mlx -m <model> --openclaw-config` prints a paste-ready provider config; cookbook recipes in `docs/clients/` cover OpenCode, OpenClaw, Cline, Continue.dev, Aider, Cursor, Hermes	`Sources/MacLocalAPI/main.swift:printOpenClawConfig`

See docs/clients/ for one-page recipes per agent.

Use with OpenCode

OpenCode is a terminal-based AI coding assistant. Connect it to afm for a fully local coding experience — no cloud, no API keys. No Internet required (other than initially download the model of course!)

1. Configure OpenCode (~/.config/opencode/opencode.json):

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "ollama": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "macafm (local)",
      "options": {
        "baseURL": "http://localhost:9999/v1"
      },
      "models": {
        "mlx-community/Qwen3-Coder-Next-4bit": {
          "name": "mlx-community/Qwen3-Coder-Next-4bit"
        }
      }
    }
  }
}

2. Start afm with a coding model:

afm mlx -m mlx-community/Qwen3-Coder-Next-4bit -t 1.0 --top-p 0.95 --max-tokens 8192

3. Launch OpenCode and type /connect. Scroll down to the very bottom of the provider list — macafm (local) will likely be the last entry. Select it, and when prompted for an API key, enter any value (e.g. x) — tokenized access is not yet implemented in afm so the key is ignored. All inference runs locally on your Mac's GPU.

28+ MLX Models Tested

MLX Models

28 models tested and verified including Qwen3, Gemma 3/3n, GLM-4/5, DeepSeek V3, LFM2, SmolLM3, Llama 3.2, MiniMax M2.5, Nemotron, and more. See test reports.

⭐ Star History

Related Projects

Vesta AI Explorer — full-featured native macOS AI chat app
AFMTrainer — LoRA fine-tuning wrapper for Apple's toolkit (Mac M-series & Linux CUDA)
Apple Foundation Model Adapters — Apple's adapter training toolkit

🌟 Features

🔗 OpenAI API Compatible - Works with existing OpenAI client libraries and applications
🧠 MLX Local Models - Run any Hugging Face MLX model locally (Qwen, Gemma, Llama, DeepSeek, GLM, and 28+ tested models)
🌐 API Gateway - Auto-discovers and proxies Ollama, LM Studio, Jan, and other local backends into a single API
⚡ LoRA adapter support - Supports fine-tuning with LoRA adapters using Apple's tuning Toolkit
📱 Apple Foundation Models - Uses Apple's on-device 3B parameter language model
👁️ Vision OCR - Extract text from images and PDFs using Apple Vision via CLI and HTTP (afm vision, /v1/vision/ocr)
🔢 Embeddings - OpenAI-compatible embeddings from Apple's NaturalLanguage model, on-device, via a dedicated server (afm embed, /v1/embeddings)
🖥️ Built-in WebUI - Chat interface with model selection (afm -w)
🔒 Privacy-First - All processing happens locally on your device
⚡ Fast & Lightweight - No network calls, no API keys required
🛠️ Easy Integration - Drop-in replacement for OpenAI API endpoints
📊 Token Usage Tracking - Provides accurate token consumption metrics

📋 Requirements

**macOS 26 (Tahoe) or later
Apple Silicon Mac (M1/M2/M3/M4 series)
Apple Intelligence enabled in System Settings
**Xcode 26 (for building from source)

🚀 Quick Start

Installation

Option 1: Homebrew (Recommended)

# Add the tap
brew tap scouzi1966/afm

# Install AFM
brew install afm

# Verify installation
afm --version

Option 2: pip (PyPI)

# Install from PyPI
pip install macafm

# Verify installation
afm --version

Option 3: Build from Source

# Clone the repository (build.sh initializes submodules for you)
git clone https://github.com/scouzi1966/maclocal-api.git
cd maclocal-api

# Build everything from scratch (checks/installs deps + patches + webui + release build)
./build.sh

# Or skip webui if you don't have Node.js
./build.sh --skip-webui

# Or use make (patches + release build, no webui)
make

# Run
./.build/release/afm --version

Running

# API server only (Apple Foundation Model on port 9999)
afm

# API server with WebUI chat interface
afm -w

# WebUI + API gateway (auto-discovers Ollama, LM Studio, Jan, etc.)
afm -w -g

# Custom port with verbose logging
afm -p 8080 -v

# Show help
afm -h

MLX Local Models

Run open-source models locally on Apple Silicon using MLX:

# Run a model with single prompt
afm mlx -m mlx-community/Qwen2.5-0.5B-Instruct-4bit -s "Explain gravity"

# Start MLX model with WebUI
afm mlx -m mlx-community/gemma-3-4b-it-8bit -w

# Interactive model picker (lists downloaded models)
afm mlx -w

# MLX model as API server
afm mlx -m mlx-community/Llama-3.2-1B-Instruct-4bit -p 8080

# Pipe mode
cat essay.txt | afm mlx -m mlx-community/Qwen3-0.6B-4bit -i "Summarize this"

# MLX help
afm mlx --help

Models are downloaded from Hugging Face on first use and cached locally. Any model from the mlx-community collection is supported.

📡 API Endpoints

Chat Completions

POST /v1/chat/completions

Compatible with OpenAI's chat completions API.

curl -X POST http://localhost:9999/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "foundation",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ]
  }'

List Models

GET /v1/models

Returns available Foundation Models.

curl http://localhost:9999/v1/models

Vision OCR

POST /v1/vision/ocr

Runs Apple Vision OCR against local files, uploads, base64 payloads, data: URLs, and OpenAI-style image inputs.

curl -X POST http://localhost:9999/v1/vision/ocr \
  -H "Content-Type: application/json" \
  -d '{
    "file": "/tmp/invoice.pdf",
    "recognition_level": "accurate",
    "languages": ["en-US"],
    "max_pages": 10
  }'

The endpoint returns structured JSON with per-document text, per-page text, text blocks, detected tables, document hints, and a top-level combined_text field. See docs/vision-ocr-api.md for request formats, options, and response details.

Embeddings

POST /v1/embeddings

Serves OpenAI-compatible embeddings backed by Apple's NaturalLanguage contextual model, fully on-device. Started with afm embed (default port 9998), separate from the chat server.

afm embed                       # start the embeddings server on port 9998

curl -X POST http://localhost:9998/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "apple-nl-contextual-en",
    "input": "The quick brown fox"
  }'

Accepts a string, an array of strings, or pre-tokenized ids; supports float/base64 output and Matryoshka-style dimensions truncation. See docs/embeddings-api.md for models, request fields, response shape, and error semantics.

Health Check

GET /health

Server health status endpoint.

curl http://localhost:9999/health

💻 Usage Examples

Python with OpenAI Library

from openai import OpenAI

# Point to your local MacLocalAPI server
client = OpenAI(
    api_key="not-needed-for-local",
    base_url="http://localhost:9999/v1"
)

response = client.chat.completions.create(
    model="foundation",
    messages=[
        {"role": "user", "content": "Explain quantum computing in simple terms"}
    ]
)

print(response.choices[0].message.content)

Vision OCR from OpenAI-Compatible Clients

The OCR endpoint also accepts OpenAI-style multimodal payloads. This is useful when your client already sends messages[].content[] parts with image_url.

curl -X POST http://localhost:9999/v1/vision/ocr \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "Extract the invoice text"},
        {
          "type": "image_url",
          "image_url": {
            "url": "data:application/pdf;base64,..."
          }
        }
      ]
    }],
    "recognition_level": "accurate",
    "languages": ["en-US"]
  }'

Foundation chat requests can also auto-run Apple Vision OCR before prompting the model when:

the request includes image content
the request includes the built-in apple_vision_ocr tool
tool_choice is auto, required, omitted, or explicitly selects that tool

JavaScript/Node.js

import OpenAI from 'openai';

const openai = new OpenAI({
  apiKey: 'not-needed-for-local',
  baseURL: 'http://localhost:9999/v1',
});

const completion = await openai.chat.completions.create({
  messages: [{ role: 'user', content: 'Write a haiku about programming' }],
  model: 'foundation',
});

console.log(completion.choices[0].message.content);

curl Examples

# Basic chat completion
curl -X POST http://localhost:9999/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "foundation",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the capital of France?"}
    ]
  }'

# With temperature control
curl -X POST http://localhost:9999/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "foundation",
    "messages": [{"role": "user", "content": "Be creative!"}],
    "temperature": 0.8
  }'

Single Prompt & Pipe Examples

# Single prompt mode
afm -s "Explain quantum computing"

# Piped input from other commands
echo "What is the meaning of life?" | afm
cat file.txt | afm
git log --oneline | head -5 | afm

# Custom instructions with pipe
echo "Review this code" | afm -i "You are a senior software engineer"

🏗️ Architecture

MacLocalAPI/
├── Package.swift                    # Swift Package Manager config
├── Sources/MacLocalAPI/
│   ├── main.swift                   # CLI entry point & ArgumentParser
│   ├── Server.swift                 # Vapor web server configuration
│   ├── Controllers/
│   │   └── ChatCompletionsController.swift  # OpenAI API endpoints
│   └── Models/
│       ├── FoundationModelService.swift     # Apple Foundation Models wrapper
│       ├── OpenAIRequest.swift              # Request data models
│       └── OpenAIResponse.swift             # Response data models
└── README.md

🔧 Configuration

Command Line Options

OVERVIEW: macOS server that exposes Apple's Foundation Models through
OpenAI-compatible API

Use -w to enable the WebUI, -g to enable API gateway mode (auto-discovers and
proxies to Ollama, LM Studio, Jan, and other local LLM backends).

USAGE: afm <options>
       afm mlx [<options>]      Run local MLX models from Hugging Face
       afm vision <image>       OCR text extraction from images/PDFs

OPTIONS:
  -s, --single-prompt <single-prompt>
                          Run a single prompt without starting the server
  -i, --instructions <instructions>
                          Custom instructions for the AI assistant (default:
                          You are a helpful assistant)
  -v, --verbose           Enable verbose logging
  --no-streaming          Disable streaming responses (streaming is enabled by
                          default)
  -a, --adapter <adapter> Path to a .fmadapter file for LoRA adapter fine-tuning
  -p, --port <port>       Port to run the server on (default: 9999)
  -H, --hostname <hostname>
                          Hostname to bind server to (default: 127.0.0.1)
  -t, --temperature <temperature>
                          Temperature for response generation (0.0-1.0)
  -r, --randomness <randomness>
                          Sampling mode: 'greedy', 'random',
                          'random:top-p=<0.0-1.0>', 'random:top-k=<int>', with
                          optional ':seed=<int>'
  -P, --permissive-guardrails
                          Permissive guardrails for unsafe or inappropriate
                          responses
  -w, --webui             Enable webui and open in default browser
  -g, --gateway           Enable API gateway mode: discover and proxy to local
                          LLM backends (Ollama, LM Studio, Jan, etc.)
  --prewarm <prewarm>     Pre-warm the model on server startup for faster first
                          response (y/n, default: y)
  --version               Show the version.
  -h, --help              Show help information.

Note: afm also accepts piped input from other commands, equivalent to using -s
with the piped content as the prompt.

Environment Variables

The server respects standard logging environment variables:

LOG_LEVEL - Set logging level (trace, debug, info, notice, warning, error, critical)

⚠️ Limitations & Notes

Model Scope: Apple Foundation Model is a 3B parameter model (optimized for on-device performance)
macOS 26+ Only: Requires the latest macOS with Foundation Models framework
Apple Intelligence Required: Must be enabled in System Settings
Token Estimation: Uses word-based approximation for token counting (Foundation model only; proxied backends report real counts)

🔍 Troubleshooting

"Foundation Models framework is not available"

Ensure you're running **macOS 26 or later
Enable Apple Intelligence in System Settings → Apple Intelligence & Siri
Verify you're on an Apple Silicon Mac
Restart the application after enabling Apple Intelligence

Server Won't Start

Check if the port is already in use: lsof -i :9999
Try a different port: afm -p 8080
Enable verbose logging: afm -v

Build Issues

Ensure you have **Xcode 26 installed
Update Swift toolchain: xcode-select --install
Clean and rebuild: swift package clean && swift build -c release

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Development Setup

# Clone the repo (build.sh initializes submodules for you)
git clone https://github.com/scouzi1966/maclocal-api.git
cd maclocal-api

# Full build from scratch (submodules + patches + webui + release)
./build.sh

# Or for debug builds during development
./build.sh --debug --skip-webui

# Run with verbose logging
./.build/debug/afm -w -g -v

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Apple for the Foundation Models framework
The Vapor Swift web framework team
OpenAI for the API specification standard
The Swift community for excellent tooling

📞 Support

If you encounter any issues or have questions:

Check the Troubleshooting section
Search existing GitHub Issues
Create a new issue with detailed information about your problem

🗺️ Roadmap

Streaming response support
MLX local model support (28+ models tested)
Multiple model support (API gateway mode)
Web UI for testing (llama.cpp WebUI integration)
Vision OCR subcommand
Function/tool calling (OpenAI-compatible, multiple formats)
Performance optimizations
BFCL integration for automated tool calling validation
Docker containerization (when supported)

Made with ❤️ for the Apple Silicon community

Bringing the power of local AI to your fingertips.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.9.13

Jun 22, 2026

0.9.12

May 8, 2026

0.9.11

Apr 21, 2026

0.9.10

Apr 8, 2026

0.9.9

Mar 31, 2026

0.9.8

Mar 29, 2026

0.9.7

Mar 17, 2026

0.9.6.1

Mar 5, 2026

0.9.6

Mar 5, 2026

0.9.5 yanked

Mar 3, 2026

Reason this release was yanked:

Packaging error

0.9.4

Feb 20, 2026

0.9.3

Jan 29, 2026

0.9.2 yanked

Jan 29, 2026

Reason this release was yanked:

Version not working

0.9.1

Jan 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

macafm-0.9.13.tar.gz (24.6 MB view details)

Uploaded Jun 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

macafm-0.9.13-py3-none-any.whl (24.7 MB view details)

Uploaded Jun 22, 2026 Python 3

File details

Details for the file macafm-0.9.13.tar.gz.

File metadata

Download URL: macafm-0.9.13.tar.gz
Upload date: Jun 22, 2026
Size: 24.6 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.14

File hashes

Hashes for macafm-0.9.13.tar.gz
Algorithm	Hash digest
SHA256	`4c1f78ebb52a27fa10be11d34c6f8c3b7069676901700a815b112f55d0163535`
MD5	`bb878df6bf9151a6a7851d8ac42f4942`
BLAKE2b-256	`56233fb3a4840fe5f17731dd757654c6311dc10f00927dbe30dde67851478c97`

See more details on using hashes here.

File details

Details for the file macafm-0.9.13-py3-none-any.whl.

File metadata

Download URL: macafm-0.9.13-py3-none-any.whl
Upload date: Jun 22, 2026
Size: 24.7 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.14

File hashes

Hashes for macafm-0.9.13-py3-none-any.whl
Algorithm	Hash digest
SHA256	`09599c8a6e5f7642bc698b09f28d7ddd02784cd115d1e87734d8f98aa038c2a5`
MD5	`d9641aaba215a8480261ea71a7bf4ef0`
BLAKE2b-256	`6c7453a4719709324932b83d0ea1191b2e08f6049c04a01bc8bb2e82063f4da2`

See more details on using hashes here.

macafm 0.9.13

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Install

🔨 Build from source — one command

Install a previous version

afm — Run Any MLX LLM on Your Mac, 100% Local

afm-next Nightly Test Report — Qwen3.5-35B-A3B Focus

What's new in afm-next

Quick Start

⚡ Speculative Decoding

1. Qwen3.6-27B — MTP (--mtp)

2. Gemma4-31B dense — EAGLE3 (--eagle3 <drafter-dir>)

Why AFM for agents

Use with OpenCode

28+ MLX Models Tested

⭐ Star History

Related Projects

🌟 Features

📋 Requirements

🚀 Quick Start

Installation

Option 1: Homebrew (Recommended)

Option 2: pip (PyPI)

Option 3: Build from Source

Running

MLX Local Models

📡 API Endpoints

Chat Completions

List Models

Vision OCR

Embeddings

Health Check

💻 Usage Examples

Python with OpenAI Library

Vision OCR from OpenAI-Compatible Clients

JavaScript/Node.js

curl Examples

Single Prompt & Pipe Examples

🏗️ Architecture

🔧 Configuration

Command Line Options

Environment Variables

⚠️ Limitations & Notes

🔍 Troubleshooting

"Foundation Models framework is not available"

Server Won't Start

Build Issues

🤝 Contributing

Development Setup

📄 License

🙏 Acknowledgments

📞 Support

🗺️ Roadmap

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

1. Qwen3.6-27B — MTP (`--mtp`)

2. Gemma4-31B dense — EAGLE3 (`--eagle3 <drafter-dir>`)