Skip to main content

A plugin for https://github.com/simonw/llm for running local CoreML .mlpackage model files.

Project description

llm-coreml

A plugin for https://llm.datasette.io/ cli tool that runs CoreML .mlpackage LLM models locally on macOS.

Point it at a model (and its corresponding HuggingFace tokenizer), then prompt it like any other llm model.

Requirements

  • macOS (CoreML is Apple-only)
  • Python 3.11–3.13 (coremltools does not yet ship native extensions for 3.14)

If llm was installed with Python 3.14, reinstall it targeting 3.13:

uv tool install llm --python 3.13 --reinstall

Installation

llm install llm-coreml

Or for development:

git clone https://github.com/anentropic/llm-coreml.git
cd llm-coreml
llm install -e .

Quick start

Register a model with a name and a path to the .mlpackage.

The --tokenizer argument is the HuggingFace model name to load the tokenizer from. This should match the HF model your .mlpackage was derived from:

llm coreml add my-llama /path/to/llama.mlpackage \
    --tokenizer meta-llama/Llama-3.2-1B-Instruct

Prompt it:

llm -m coreml/my-llama "Explain quantum computing in one sentence"

Check it shows up in llm models:

llm models | grep coreml

Usage

Prompting

# Basic prompt
llm -m coreml/my-llama "What is Rust?"

# With a system prompt
llm -m coreml/my-llama "Hello" -s "You are a pirate"

# Continue a conversation
llm -m coreml/my-llama "What is Rust?"
llm -c "Compare it to Go"

Model options

Pass options with -o:

llm -m coreml/my-llama "Write a haiku" \
    -o temperature 0.7 \
    -o top_p 0.9 \
    -o max_tokens 50

Python API

import llm

model = llm.get_model("coreml/my-llama")
response = model.prompt("What is the capital of France?")
print(response.text())

CLI reference

llm coreml add

llm coreml add <name> <path> --tokenizer <hf_id> [--compute-units <units>]

Register a CoreML model.

Argument Description
name Model name, used as coreml/<name>
path Path to the .mlpackage directory (resolved to absolute)
--tokenizer HuggingFace tokenizer model ID (required)
--compute-units Compute units: all, cpu_only, cpu_and_gpu, cpu_and_ne (default: all)

llm coreml list

llm coreml list

Lists registered models with their paths, tokenizer IDs, and compute units.

llm coreml remove

llm coreml remove <name>

Removes a registered model. Exits with code 1 if the model doesn't exist.

Model options reference

Option Type Default Description
max_tokens int 200 Maximum tokens to generate
temperature float 0.0 Sampling temperature. 0 = greedy (deterministic)
top_p float 1.0 Top-p nucleus sampling threshold

How it works

Format auto-detection

The plugin reads the CoreML model spec at load time and checks the input names:

  • inputIds (camelCase) = Apple format, uses float16 causal masks
  • input_ids (snake_case) = HuggingFace format, uses int32 attention masks

No config file needed.

Stateful KV-cache

If the model spec declares stateDescriptions, the plugin uses stateful inference with KV-cache. Otherwise it falls back to stateless inference, which reprocesses the full sequence each step (slower, but works with older models).

Tokenization

The plugin uses transformers.AutoTokenizer for tokenization. The tokenizer is downloaded and cached the first time you use a model.

For chat/instruct models (e.g. Llama-3-Instruct), the tokenizer's chat_template is used to format the conversation with system prompts and multi-turn history.

For completion models without a chat template (e.g. GPT-2), the prompt text is tokenized directly. System prompts and conversation history are not supported for these models.

Getting CoreML models

You can get .mlpackage LLM models by:

  • Converting HuggingFace models with coremltools
  • Using Apple's ml-explore tools
  • Downloading pre-converted models from HuggingFace (search for "coreml" tagged models)

Development

uv sync --dev

Quality gates

uv run basedpyright      # Type checking (strict)
uv run ruff check        # Linting
uv run ruff format       # Formatting
uv run pytest            # Tests

Or all at once:

prek run --all-files

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_coreml-0.3.0.tar.gz (7.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_coreml-0.3.0-py3-none-any.whl (9.3 kB view details)

Uploaded Python 3

File details

Details for the file llm_coreml-0.3.0.tar.gz.

File metadata

  • Download URL: llm_coreml-0.3.0.tar.gz
  • Upload date:
  • Size: 7.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llm_coreml-0.3.0.tar.gz
Algorithm Hash digest
SHA256 fc8c70d3527c4ee910fa6e331887cace3a91af8b73642e52f6ed39ce869d48f1
MD5 83d8f25b7e8a547ac7a8390629cc9f5d
BLAKE2b-256 f944959d8812d30cbc1eb54371603e55fe740ed308f0db3af9e4563fa6004aa0

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_coreml-0.3.0.tar.gz:

Publisher: release.yml on anentropic/llm-coreml

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file llm_coreml-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: llm_coreml-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 9.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llm_coreml-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bf68a30f3e494aa902f949ff1f6115b7375eb2929960f5ae5e420d3fb8d17605
MD5 28058d9c4edafd63b7821911dd6c7f31
BLAKE2b-256 191c710e3aeef698699397226a948d764b50e38745328535e28014266964bf8f

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_coreml-0.3.0-py3-none-any.whl:

Publisher: release.yml on anentropic/llm-coreml

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page