Skip to main content

A plugin for https://github.com/simonw/llm for running local CoreML .mlpackage model files.

Project description

llm-coreml

A plugin for https://llm.datasette.io/ cli tool that runs CoreML .mlpackage LLM models locally on macOS.

Point it at a model (and its corresponding HuggingFace tokenizer), then prompt it like any other llm model.

Requirements

  • macOS (CoreML is Apple-only)
  • Python 3.11+

Installation

llm install llm-coreml

Or for development:

git clone https://github.com/anentropic/llm-coreml.git
cd llm-coreml
llm install -e .

Quick start

Register a model with a name and a path to the .mlpackage.

The --tokenizer argument is the HuggingFace model name to load the tokenizer from. This should match the HF model your .mlpackage was derived from:

llm coreml add my-llama /path/to/llama.mlpackage \
    --tokenizer meta-llama/Llama-3.2-1B-Instruct

Prompt it:

llm -m coreml/my-llama "Explain quantum computing in one sentence"

Check it shows up in llm models:

llm models | grep coreml

Usage

Prompting

# Basic prompt
llm -m coreml/my-llama "What is Rust?"

# With a system prompt
llm -m coreml/my-llama "Hello" -s "You are a pirate"

# Continue a conversation
llm -m coreml/my-llama "What is Rust?"
llm -c "Compare it to Go"

Model options

Pass options with -o:

llm -m coreml/my-llama "Write a haiku" \
    -o temperature 0.7 \
    -o top_p 0.9 \
    -o max_tokens 50

Python API

import llm

model = llm.get_model("coreml/my-llama")
response = model.prompt("What is the capital of France?")
print(response.text())

CLI reference

llm coreml add

llm coreml add <name> <path> --tokenizer <hf_id> [--compute-units <units>]

Register a CoreML model.

Argument Description
name Model name, used as coreml/<name>
path Path to the .mlpackage directory (resolved to absolute)
--tokenizer HuggingFace tokenizer model ID (required)
--compute-units Compute units: all, cpu_only, cpu_and_gpu, cpu_and_ne (default: all)

llm coreml list

llm coreml list

Lists registered models with their paths, tokenizer IDs, and compute units.

llm coreml remove

llm coreml remove <name>

Removes a registered model. Exits with code 1 if the model doesn't exist.

Model options reference

Option Type Default Description
max_tokens int 200 Maximum tokens to generate
temperature float 0.0 Sampling temperature. 0 = greedy (deterministic)
top_p float 1.0 Top-p nucleus sampling threshold

How it works

Format auto-detection

The plugin reads the CoreML model spec at load time and checks the input names:

  • inputIds (camelCase) = Apple format, uses float16 causal masks
  • input_ids (snake_case) = HuggingFace format, uses int32 attention masks

No config file needed.

Stateful KV-cache

If the model spec declares stateDescriptions, the plugin uses stateful inference with KV-cache. Otherwise it falls back to stateless inference, which reprocesses the full sequence each step (slower, but works with older models).

Tokenization

The plugin uses transformers.AutoTokenizer with apply_chat_template() to handle chat formatting. The tokenizer is downloaded and cached the first time you use a model.

Getting CoreML models

You can get .mlpackage LLM models by:

  • Converting HuggingFace models with coremltools
  • Using Apple's ml-explore tools
  • Downloading pre-converted models from HuggingFace (search for "coreml" tagged models)

Development

uv sync --dev

Quality gates

uv run basedpyright      # Type checking (strict)
uv run ruff check        # Linting
uv run ruff format       # Formatting
uv run pytest            # Tests

Or all at once:

prek run --all-files

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_coreml-0.1.0.tar.gz (7.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_coreml-0.1.0-py3-none-any.whl (9.0 kB view details)

Uploaded Python 3

File details

Details for the file llm_coreml-0.1.0.tar.gz.

File metadata

  • Download URL: llm_coreml-0.1.0.tar.gz
  • Upload date:
  • Size: 7.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llm_coreml-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6d55b09478b89ff8c8d1d7ed4c2a5d2d468aa77093b5cb8b51840b5c9ec5b9aa
MD5 b1eb732f6921f7acdd9542118694ec88
BLAKE2b-256 18794be2b2e72f085cadda941dc36bb74747b4ee3ff4acd6aa01d5476da9f0c3

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_coreml-0.1.0.tar.gz:

Publisher: release.yml on anentropic/llm-coreml

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file llm_coreml-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: llm_coreml-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 9.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llm_coreml-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1d3b07de8bdaa614ab9c8e750c39f910e01b2d91d16000cd00139d7382084d5a
MD5 996f287a26d52c2f490d6011de26c080
BLAKE2b-256 1b2fae855cb77f518351bd5147d8b6935471d64920d8d3686db71ea4000000d3

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_coreml-0.1.0-py3-none-any.whl:

Publisher: release.yml on anentropic/llm-coreml

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page