Skip to main content

A plugin for https://github.com/simonw/llm for running local CoreML .mlpackage model files.

Project description

llm-coreml

A plugin for https://llm.datasette.io/ cli tool that runs CoreML .mlpackage LLM models locally on macOS.

Point it at a model (and its corresponding HuggingFace tokenizer), then prompt it like any other llm model.

Requirements

  • macOS (CoreML is Apple-only)
  • Python 3.11+

Installation

llm install llm-coreml

Or for development:

git clone https://github.com/anentropic/llm-coreml.git
cd llm-coreml
llm install -e .

Quick start

Register a model with a name and a path to the .mlpackage.

The --tokenizer argument is the HuggingFace model name to load the tokenizer from. This should match the HF model your .mlpackage was derived from:

llm coreml add my-llama /path/to/llama.mlpackage \
    --tokenizer meta-llama/Llama-3.2-1B-Instruct

Prompt it:

llm -m coreml/my-llama "Explain quantum computing in one sentence"

Check it shows up in llm models:

llm models | grep coreml

Usage

Prompting

# Basic prompt
llm -m coreml/my-llama "What is Rust?"

# With a system prompt
llm -m coreml/my-llama "Hello" -s "You are a pirate"

# Continue a conversation
llm -m coreml/my-llama "What is Rust?"
llm -c "Compare it to Go"

Model options

Pass options with -o:

llm -m coreml/my-llama "Write a haiku" \
    -o temperature 0.7 \
    -o top_p 0.9 \
    -o max_tokens 50

Python API

import llm

model = llm.get_model("coreml/my-llama")
response = model.prompt("What is the capital of France?")
print(response.text())

CLI reference

llm coreml add

llm coreml add <name> <path> --tokenizer <hf_id> [--compute-units <units>]

Register a CoreML model.

Argument Description
name Model name, used as coreml/<name>
path Path to the .mlpackage directory (resolved to absolute)
--tokenizer HuggingFace tokenizer model ID (required)
--compute-units Compute units: all, cpu_only, cpu_and_gpu, cpu_and_ne (default: all)

llm coreml list

llm coreml list

Lists registered models with their paths, tokenizer IDs, and compute units.

llm coreml remove

llm coreml remove <name>

Removes a registered model. Exits with code 1 if the model doesn't exist.

Model options reference

Option Type Default Description
max_tokens int 200 Maximum tokens to generate
temperature float 0.0 Sampling temperature. 0 = greedy (deterministic)
top_p float 1.0 Top-p nucleus sampling threshold

How it works

Format auto-detection

The plugin reads the CoreML model spec at load time and checks the input names:

  • inputIds (camelCase) = Apple format, uses float16 causal masks
  • input_ids (snake_case) = HuggingFace format, uses int32 attention masks

No config file needed.

Stateful KV-cache

If the model spec declares stateDescriptions, the plugin uses stateful inference with KV-cache. Otherwise it falls back to stateless inference, which reprocesses the full sequence each step (slower, but works with older models).

Tokenization

The plugin uses transformers.AutoTokenizer with apply_chat_template() to handle chat formatting. The tokenizer is downloaded and cached the first time you use a model.

Getting CoreML models

You can get .mlpackage LLM models by:

  • Converting HuggingFace models with coremltools
  • Using Apple's ml-explore tools
  • Downloading pre-converted models from HuggingFace (search for "coreml" tagged models)

Development

uv sync --dev

Quality gates

uv run basedpyright      # Type checking (strict)
uv run ruff check        # Linting
uv run ruff format       # Formatting
uv run pytest            # Tests

Or all at once:

prek run --all-files

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_coreml-0.2.0.tar.gz (7.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_coreml-0.2.0-py3-none-any.whl (9.0 kB view details)

Uploaded Python 3

File details

Details for the file llm_coreml-0.2.0.tar.gz.

File metadata

  • Download URL: llm_coreml-0.2.0.tar.gz
  • Upload date:
  • Size: 7.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llm_coreml-0.2.0.tar.gz
Algorithm Hash digest
SHA256 6be5b7cde0070b6b23f40f745f66fd1380e46dd6ff77a9fecd20c001a7c3500d
MD5 c40a0082ace02156352fb0622174ebed
BLAKE2b-256 1880cb84984c6207e9a05972e4987b17a425086ddc5a636dc20ae96ceaa79051

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_coreml-0.2.0.tar.gz:

Publisher: release.yml on anentropic/llm-coreml

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file llm_coreml-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: llm_coreml-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 9.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llm_coreml-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1f87b0b412d75a19c72a051309657361b275d4a11a547afed6be452213bdd8bc
MD5 e44de3de05217476185fa8ba86e89fe8
BLAKE2b-256 4d3f2c6bc8a33eec1e54cbb7a183cf7995ba4615ca0d6d0d3eb40b403a9a1c69

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_coreml-0.2.0-py3-none-any.whl:

Publisher: release.yml on anentropic/llm-coreml

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page