A plugin for https://github.com/simonw/llm for running local CoreML .mlpackage model files.

Project description

llm-coreml

A plugin for https://llm.datasette.io/ cli tool that runs CoreML .mlpackage LLM models locally on macOS.

Point it at a model (and its corresponding HuggingFace tokenizer), then prompt it like any other llm model.

Requirements

macOS (CoreML is Apple-only)
Python 3.11+

Installation

llm install llm-coreml

Or for development:

git clone https://github.com/anentropic/llm-coreml.git
cd llm-coreml
llm install -e .

Quick start

The --tokenizer argument is the HuggingFace model name to load the tokenizer from. This should match the HF model your .mlpackage was derived from:

llm coreml add my-llama /path/to/llama.mlpackage \
    --tokenizer meta-llama/Llama-3.2-1B-Instruct

Prompt it:

llm -m coreml/my-llama "Explain quantum computing in one sentence"

Check it shows up in llm models:

llm models | grep coreml

Usage

Prompting

# Basic prompt
llm -m coreml/my-llama "What is Rust?"

# With a system prompt
llm -m coreml/my-llama "Hello" -s "You are a pirate"

# Continue a conversation
llm -m coreml/my-llama "What is Rust?"
llm -c "Compare it to Go"

Model options

Pass options with -o:

llm -m coreml/my-llama "Write a haiku" \
    -o temperature 0.7 \
    -o top_p 0.9 \
    -o max_tokens 50

Python API

import llm

model = llm.get_model("coreml/my-llama")
response = model.prompt("What is the capital of France?")
print(response.text())

CLI reference

`llm coreml add`

llm coreml add <name> <path> --tokenizer <hf_id> [--compute-units <units>]

Argument	Description
`name`	Model name, used as `coreml/<name>`
`path`	Path to the `.mlpackage` directory (resolved to absolute)
`--tokenizer`	HuggingFace tokenizer model ID (required)
`--compute-units`	Compute units: `all`, `cpu_only`, `cpu_and_gpu`, `cpu_and_ne` (default: `all`)

`llm coreml list`

llm coreml list

Lists registered models with their paths, tokenizer IDs, and compute units.

`llm coreml remove`

llm coreml remove <name>

Removes a registered model. Exits with code 1 if the model doesn't exist.

Model options reference

Option	Type	Default	Description
`max_tokens`	int	200	Maximum tokens to generate
`temperature`	float	0.0	Sampling temperature. 0 = greedy (deterministic)
`top_p`	float	1.0	Top-p nucleus sampling threshold

How it works

Format auto-detection

The plugin reads the CoreML model spec at load time and checks the input names:

inputIds (camelCase) = Apple format, uses float16 causal masks
input_ids (snake_case) = HuggingFace format, uses int32 attention masks

No config file needed.

Stateful KV-cache

If the model spec declares stateDescriptions, the plugin uses stateful inference with KV-cache. Otherwise it falls back to stateless inference, which reprocesses the full sequence each step (slower, but works with older models).

Tokenization

The plugin uses transformers.AutoTokenizer with apply_chat_template() to handle chat formatting. The tokenizer is downloaded and cached the first time you use a model.

Getting CoreML models

You can get .mlpackage LLM models by:

Converting HuggingFace models with coremltools
Using Apple's ml-explore tools
Downloading pre-converted models from HuggingFace (search for "coreml" tagged models)

Development

uv sync --dev

Quality gates

uv run basedpyright      # Type checking (strict)
uv run ruff check        # Linting
uv run ruff format       # Formatting
uv run pytest            # Tests

Or all at once:

prek run --all-files

License

MIT

Project details

Release history Release notifications | RSS feed

0.3.0

Mar 4, 2026

This version

0.2.0

Mar 4, 2026

0.1.0

Mar 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_coreml-0.2.0.tar.gz (7.2 kB view details)

Uploaded Mar 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llm_coreml-0.2.0-py3-none-any.whl (9.0 kB view details)

Uploaded Mar 4, 2026 Python 3

File details

Details for the file llm_coreml-0.2.0.tar.gz.

File metadata

Download URL: llm_coreml-0.2.0.tar.gz
Upload date: Mar 4, 2026
Size: 7.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llm_coreml-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`6be5b7cde0070b6b23f40f745f66fd1380e46dd6ff77a9fecd20c001a7c3500d`
MD5	`c40a0082ace02156352fb0622174ebed`
BLAKE2b-256	`1880cb84984c6207e9a05972e4987b17a425086ddc5a636dc20ae96ceaa79051`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_coreml-0.2.0.tar.gz:

Publisher: release.yml on anentropic/llm-coreml

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llm_coreml-0.2.0.tar.gz
- Subject digest: 6be5b7cde0070b6b23f40f745f66fd1380e46dd6ff77a9fecd20c001a7c3500d
- Sigstore transparency entry: 1026997998
- Sigstore integration time: Mar 4, 2026
Source repository:
- Permalink: anentropic/llm-coreml@599078133d416288522b92476d2f2a187c9b322c
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/anentropic
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@599078133d416288522b92476d2f2a187c9b322c
- Trigger Event: push

File details

Details for the file llm_coreml-0.2.0-py3-none-any.whl.

File metadata

Download URL: llm_coreml-0.2.0-py3-none-any.whl
Upload date: Mar 4, 2026
Size: 9.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llm_coreml-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1f87b0b412d75a19c72a051309657361b275d4a11a547afed6be452213bdd8bc`
MD5	`e44de3de05217476185fa8ba86e89fe8`
BLAKE2b-256	`4d3f2c6bc8a33eec1e54cbb7a183cf7995ba4615ca0d6d0d3eb40b403a9a1c69`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_coreml-0.2.0-py3-none-any.whl:

Publisher: release.yml on anentropic/llm-coreml

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llm_coreml-0.2.0-py3-none-any.whl
- Subject digest: 1f87b0b412d75a19c72a051309657361b275d4a11a547afed6be452213bdd8bc
- Sigstore transparency entry: 1026998191
- Sigstore integration time: Mar 4, 2026
Source repository:
- Permalink: anentropic/llm-coreml@599078133d416288522b92476d2f2a187c9b322c
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/anentropic
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@599078133d416288522b92476d2f2a187c9b322c
- Trigger Event: push

llm-coreml 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

llm-coreml

Requirements

Installation

Quick start

Usage

Prompting

Model options

Python API

CLI reference

llm coreml add

llm coreml list

llm coreml remove

Model options reference

How it works

Format auto-detection

Stateful KV-cache

Tokenization

Getting CoreML models

Development

Quality gates

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`llm coreml add`

`llm coreml list`

`llm coreml remove`