Direct Apple Neural Engine (ANE) backend. A CoreML-free Python frontend that compiles operator graphs into a single fused e5rt program and dispatches them to the ANE.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

sbryngelson

These details have not been verified by PyPI

Project links

Documentation

Project description

ANEForge

Train and run neural networks directly on the Apple Neural Engine, from Python, with no CoreML.

A small transformer trains from scratch and generates text live on the Apple Neural Engine

_{A transformer training from scratch on the engine (forward, backward, and
Adam), then completing a prompt. Reproduce with python examples/demo.py.}

Apple exposes the Neural Engine only through CoreML, for inference only. CoreML decides whether your model lands on the engine or quietly falls back to the CPU or GPU, and it gives you no way to train there. ANEForge skips it: it compiles a tensor graph into one ANE program and dispatches that program through the same private aned stack CoreML, MPSGraph, and Espresso use internally. From there:

Training runs on the engine. The forward pass, the backward pass, and the Adam update all compile to ANE programs.
A CNN trains from scratch on CIFAR-10 to 71%, on a chip Apple ships for inference only.
Hardware layers CoreML can't reach. af.sdpa drives the engine's fused-attention layer directly, the one Apple's compiler decomposes and never emits; 18 other native layers (argmax, topk, sort, geometry) come the same way.
The engine, never a fallback. A pretrained ResNet-18 runs end-to-end in 0.33 ms, matching the reference to cosine 1.0000, at a fraction of the GPU's energy (table below).
Cross-compilation for chips you don't own. Lower and gate a graph for any of 28 ANE targets (M1-M5) from one machine, and estimate its latency without running it.

import aneforge as af

x   = af.input((1, 3, 32, 32))             # a lazy graph input
y   = af.conv(x, W, pad=1).relu().mean((2, 3))
net = af.compile(y, compress="int8")       # graph -> one fused ANE program
out = net(image)                           # callable; runs on ANE silicon

# ...or load a pretrained model
enc = af.load(".../all-MiniLM-L6-v2")      # MiniLM sentence encoder
vec = enc(tokens)                          # on-device, cosine 1.0000 vs reference

A graph is built from 58 fused operators and 19 native bridge operators, lowered into a single program and reused across calls, with a near-70 us dispatch floor.

Status: research project on Apple Silicon / macOS, verified on M5 Pro and M1 Max. Relies on private framework symbols that may change without notice. Not affiliated with Apple.

Install

Apple Silicon Mac, macOS 14+, Xcode command-line tools, Python 3.10+.

pip install aneforge

The e5rt dispatch shim links Apple frameworks, so it compiles from source on your Mac the first time you dispatch to the ANE (or ahead of time with python -m aneforge.build). Optional extras: pip install "aneforge[models]" for the pretrained loaders (torch / torchvision / transformers).

For the examples, tests, and benchmarks, work from a checkout:

git clone https://github.com/sbryngelson/ANEForge.git
cd ANEForge
pip install -e ".[dev]"
PYTHONPATH=. python3 tests/op_smoketest.py    # compile + run each op on the ANE

Then browse examples/, starting with examples/quickstart.py. To run an existing ONNX model on the ANE, examples/onnx_import.py imports a .onnx classifier via af.load_onnx and validates it against onnxruntime (cosine 1.0000); examples/onnx_finetune.py imports one as a frozen feature extractor and trains a new head on it entirely on the ANE (transfer learning). For LLMs, examples/llm_chat.py is an interactive chat that streams a reply token-by-token on the ANE (resident KV-cache decode, ~75 tok/s on Qwen3-0.6B), and examples/llm_prefill.py loads a Llama/Qwen-class model via af.load_llm and benchmarks prefill/decode (matching Hugging Face logits). For retrieval, examples/rag_embeddings.py is a LangChain Embeddings drop-in backed by the on-ANE encoder (4-5x faster than the GPU, cosine 1.0000).

How it compares

	On the ANE	No CoreML	Trains on it
CoreML / coremltools	scheduler chooses	--	no
MLX, PyTorch (MPS)	no (GPU)	yes	on the GPU
ANEForge	yes (direct)	yes	yes

CoreML is the only public door to the engine, and it only ever decides whether to use it. ANEForge compiles to the engine directly, from an ordinary user process, with no entitlement and without disabling system integrity protection.

Measured

Single input, fp16, on an M5 Pro. The GPU baseline is PyTorch on Metal (MPS) at fp16; energy is whole-package, read with powermetrics.

Pretrained model	ANE	GPU (fp16)	ANE energy	GPU energy
ResNet-18	0.33 ms	2.03 ms	2.2 mJ	35 mJ
MiniLM encoder	0.53 ms	1.92 ms	2.4 mJ	21 mJ
ViT-B/16	18.3 ms	15.9 ms	75 mJ	612 mJ

The engine is faster on the convolutional and encoder workloads and 8-16x more energy-efficient on all three, even on ViT-B/16, where the GPU edges it in latency. Reproduce with bench/device_compare_wattcomplete.py and bench/real_models_fp16.py; the full per-workload device map (16 classes, measured on M1 / M2 / M5) is in bench/results/.

A fluid simulation on the Neural Engine

A passive dye shaped as the word ANEForge stirred into glowing filaments by a fluid simulation on the Apple Neural Engine

A passive dye is painted as the word ANEForge, and a 2-D incompressible Navier-Stokes flow (pseudo-spectral) stirs it into thin glowing filaments. Every Fourier transform in the 2,200-step loop runs on the ANE, and the whole simulation costs about 9 J at the measured 1.48 W rail. Reproduce with python examples/fluid_vorticity.py.

Reaction-diffusion on the Neural Engine

A Gray-Scott reaction-diffusion system grown from the word ANEForge into a branching labyrinth on the Apple Neural Engine

The Gray-Scott equations grow Turing patterns from two diffusing, reacting chemicals (the mechanism behind seashell and animal-coat markings). The word ANEForge is seeded and blooms into a branching labyrinth. The whole update is one program that re-dispatches every step: a 3x3 Laplacian as a native ANE conv, the reaction terms as elementwise ops, the periodic boundary wrapped in-graph from the field's own edges. It is the real-space companion to the fluid demo above, which takes its derivatives spectrally (FFTs); this one uses a stencil (a conv). Reproduce with python examples/reaction_diffusion.py.

A neural network that grows, trained on the Neural Engine

A neural cellular automaton, trained on the Apple Neural Engine, grows a lizard from a single seed pixel

A cellular-automaton update rule (a small CNN, shared across every cell) is trained so that a single live seed pixel grows into a target image, the way morphogenesis builds a body from one cell. The forward pass through the rollout and the backward pass both run on the engine, gradient-checkpointed so the rollout's depth does not bound the compile (the optimizer runs host-side over the streamed gradients). So the rule is learned on the engine, not just run there, then dispatched step by step to grow the image, again on the engine. Reproduce with python examples/train_neural_ca.py.

What it does

Graph -> compile -> run. 58 fused operators (conv/pool, matmul/bmm/einsum, activations, reductions, norms, softmax, attention, shape/geometry) into one program with int8/int4/fp16 weights, plus a bridge route for 19 native ops the public toolchain never emits.
Streaming weight compression. int8, int4-LUT, or sparse weights streamed from the engine's dequant path (~4x smaller for int4), accuracy-gated.
On-device uint8 image input, dequantized in-graph, so raw camera or video bytes feed the model directly.
Resident state. KV-cache and optimizer state kept on the engine across steps via buffer aliasing (share_buffer).
Accuracy-preserving optimizer. af.tune measures equivalent lowerings on the engine and returns the lossless pick.
Linear algebra and spectral methods. aneforge.linalg and aneforge.fft as static-dataflow graphs.

What runs

Pretrained models, each fused into one ANE program:

Model	Task	Fidelity vs reference
ResNet-18	ImageNet classification	cosine 1.0000
ViT-B/16	vision transformer encoder	cosine 1.0000
all-MiniLM-L6-v2	sentence embedding	cosine 1.0000
ESPCN	super-resolution	runs end to end
Stable Diffusion 1.5	U-Net + VAE (per component)	U-Net 1.5%, VAE 4.4% rel.

Trained from scratch on the engine: an MLP, a CNN (CIFAR-10 to 71%), a transformer block, a LLaMA-style block, and a character language model. Operator coverage is tracked op by op across M1 to M5 in the op catalog, the exhaustive native-MIL-op x device table; capabilities has the dtype matrix and the known limits.

Language models

Decoder LLMs run on the ANE from Hugging Face weights or GGUF - prefill plus resident-KV-cache decode, auto-segmented past the ~2 GB single-program ceiling:

Model	What runs	Measured
Qwen3-0.6B / 8B	dense decode, matches HF logits	~75 / ~7.5 tok/s decode
Qwen3-8B + 0.6B draft	speculative decoding, exact	2.28x (7.4 -> 16.8 tok/s)
Qwen1.5-MoE-A2.7B	sparse MoE, full model on pure ANE	coherent text, ~2 tok/s (int8)
Qwen3.5-27B hybrid	48 DeltaNet + 16 attn on pure ANE	coherent int8 (fp16-bound vs llama.cpp)

Speculative verify is near-free on the ANE (verify(K) ~ verify(1), decode is latency-bound); MoE decode at 30B scale is weight-bandwidth-bound. Full writeup in the LLMs guide.

Verify

The correctness corpus compiles and runs every op and kernel on the ANE, and serves as a reproducibility test:

PYTHONPATH=. python3 tests/run_corpus.py
PYTHONPATH=. python3 -m pytest tests/ -q

Documentation

The manual is hosted at aneforge.readthedocs.io. The API is documented in the module docstrings and demonstrated in examples/.

The reverse engineering ANEForge builds on, the program-container format, the e5rt dispatch path, and the engine internals down to the firmware, is collected in the ANE guide at ane-guide.readthedocs.io (arXiv:2606.22283).

Contributing

CONTRIBUTING.md has the bug-report checklist (include your chip and macOS version), the development setup, and where to start. Report security issues privately per the SECURITY.md guidelines.

License

MIT. The Apple Neural Engine is proprietary hardware, and the framework symbols this project calls are private, undocumented, and may change at any time. Nothing here is endorsed by, or constitutes an API contract from, Apple.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

sbryngelson

These details have not been verified by PyPI

Project links

Documentation

Release history Release notifications | RSS feed

This version

0.2.0

Jun 28, 2026

0.1.4

Jun 25, 2026

0.1.3

Jun 22, 2026

0.1.2

Jun 22, 2026

0.1.1

Jun 14, 2026

0.1.0

Jun 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aneforge-0.2.0.tar.gz (30.4 MB view details)

Uploaded Jun 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

aneforge-0.2.0-py3-none-any.whl (275.0 kB view details)

Uploaded Jun 28, 2026 Python 3

File details

Details for the file aneforge-0.2.0.tar.gz.

File metadata

Download URL: aneforge-0.2.0.tar.gz
Upload date: Jun 28, 2026
Size: 30.4 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for aneforge-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`a45d4ab30b4b91705ce09dc29f24219b5fc0e24b4cf58e2de9a024415f5dbece`
MD5	`c778fae13ff6c228158564616901d13a`
BLAKE2b-256	`adab55341d19143dc65ce9f491e89b84377cf379a0c3dcbc150ddf465f568362`

See more details on using hashes here.

Provenance

The following attestation bundles were made for aneforge-0.2.0.tar.gz:

Publisher: release.yml on sbryngelson/ANEForge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: aneforge-0.2.0.tar.gz
- Subject digest: a45d4ab30b4b91705ce09dc29f24219b5fc0e24b4cf58e2de9a024415f5dbece
- Sigstore transparency entry: 1997002684
- Sigstore integration time: Jun 28, 2026
Source repository:
- Permalink: sbryngelson/ANEForge@c0440cc49f6caf55705e904a37bfedcb3e04cfd7
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/sbryngelson
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@c0440cc49f6caf55705e904a37bfedcb3e04cfd7
- Trigger Event: push

File details

Details for the file aneforge-0.2.0-py3-none-any.whl.

File metadata

Download URL: aneforge-0.2.0-py3-none-any.whl
Upload date: Jun 28, 2026
Size: 275.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for aneforge-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`39d84e6daeac2101f850700fb40859e7c846ea0124faed5615ee4b9d33ccbce5`
MD5	`e4e7598a9bcfffe387dc7b9b2cdac69a`
BLAKE2b-256	`4718c1f3bc21f63f4dd86d85901f42ae7739fd969b4b56874a6f02b95dc23102`

See more details on using hashes here.

Provenance

The following attestation bundles were made for aneforge-0.2.0-py3-none-any.whl:

Publisher: release.yml on sbryngelson/ANEForge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: aneforge-0.2.0-py3-none-any.whl
- Subject digest: 39d84e6daeac2101f850700fb40859e7c846ea0124faed5615ee4b9d33ccbce5
- Sigstore transparency entry: 1997002795
- Sigstore integration time: Jun 28, 2026
Source repository:
- Permalink: sbryngelson/ANEForge@c0440cc49f6caf55705e904a37bfedcb3e04cfd7
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/sbryngelson
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@c0440cc49f6caf55705e904a37bfedcb3e04cfd7
- Trigger Event: push

aneforge 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

ANEForge

Install

How it compares

Measured

A fluid simulation on the Neural Engine

Reaction-diffusion on the Neural Engine

A neural network that grows, trained on the Neural Engine

What it does

What runs

Language models

Verify

Documentation

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance