Foundational Metal Linear Algebra Primitives for PyTorch

These details have not been verified by PyPI

Project links

Project description

metalcore

High-performance Metal-accelerated linear algebra and training operations for PyTorch on Apple Silicon.

Overview

metalcore provides optimized custom Metal kernels for PyTorch on macOS, bypassing generic MPS fallbacks for significantly faster computation.

Installation

pip install metalcore

Key Features

Linear Algebra

SVD: Jacobi algorithm, 25x faster for LLM weight matrices
QR: Blocked Householder, 20x faster batched
Eigh: Symmetric eigendecomposition, 3.5x faster
Cholesky: MAGMA-style, 33x faster batched
Solve: LU-based, 10x faster batched (fp16/bf16 supported)

Training Ops

RMSNorm (MetalRMSNorm): ~1.5x faster than PyTorch
AdamW (MetalAdamW): 2.4x faster optimizer
SiLU (metal_silu): 1.1x faster
EmbeddingBag: 6x faster (avoids CPU fallback)
LayerNorm, Softmax: Fused implementations

RoPE

apply_rotary_pos_emb: Metal-accelerated rotary embeddings (3.4x faster)
RotaryEmbedding: Drop-in HuggingFace replacement module
patch_transformers_rope: Auto-patches Llama/Mistral/Qwen models

INT4 Quantization

Hybrid approach (recommended): Int4Linear.from_float(linear, dequant_on_load=True)
- Store as INT4 (7x disk compression), dequant to FP16 at load → 0.6ms matmul
GGML block_q4_0 (llama.cpp compatible): quantize_ggml_q4_0, matmul_ggml_q4_0
- Ported from llama.cpp using simdgroup_multiply_accumulate
- 4-15x overhead vs FP16 (36x faster than naive)
- Enables larger models: 7B→3.5GB, 70B→35GB

PyTorch Integration

import metalcore

# Automatically accelerate F.silu, F.embedding_bag, torch.linalg.svd/qr
# Also replaces torch.optim.AdamW -> MetalAdamW, torch.nn.RMSNorm -> MetalRMSNorm
metalcore.enable_pytorch_overrides()

# Works seamlessly with HuggingFace models
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("...", device_map="mps")

# Optional: Also patch RMSNorm and RoPE modules
metalcore.patch_transformers_rmsnorm(model)
metalcore.patch_transformers_rope(model)

Quick Start

import torch
import metalcore

device = 'mps'

# SVD
A = torch.randn(100, 50, device=device)
U, S, V = metalcore.svd(A)

# Batched QR
B = torch.randn(100, 16, 16, device=device)
Q, R = metalcore.qr(B)

# Linear Solve (fp16/bf16 supported)
A = torch.randn(100, 32, 32, device=device)
b = torch.randn(100, 32, device=device)
x = metalcore.solve(A, b)

# Training Ops
from metalcore import MetalRMSNorm, MetalAdamW, metal_gelu

norm = MetalRMSNorm(512).to(device)
x = torch.randn(32, 128, 512, device=device)
y = norm(x)

model = torch.nn.Linear(512, 256).to(device)
optimizer = MetalAdamW(model.parameters(), lr=1e-3)

y = metal_gelu(x)

Performance Highlights

Operation	Speedup
RMSNorm	~1.5x
EmbeddingBag	6x (vs CPU fallback)
AdamW	2.4x
RoPE	3.4x
SiLU	1.1x
QR Batched	up to 20x
SVD (large)	up to 12x
Fused MLP Bwd	5-6x (vs Autograd)
Fused Attn Bwd	Parity with FP16

Requirements

macOS 12.0+ with Apple Silicon (M1/M2/M3/M4)
Python 3.9 - 3.14
PyTorch 2.0+

Note: M3/M4 chips recommended for best bf16 performance. The library gracefully falls back to FP32 on older hardware.

Author

Kris Bailey

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.16

Jan 14, 2026

0.1.15

Jan 13, 2026

0.1.14

Jan 13, 2026

0.1.13

Jan 8, 2026

0.1.11

Jan 7, 2026

0.1.9

Jan 7, 2026

0.1.8

Jan 6, 2026

0.1.7

Jan 6, 2026

0.1.6

Jan 6, 2026

0.1.5

Jan 5, 2026

0.1.3

Jan 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metalcore-0.1.16.tar.gz (2.3 MB view details)

Uploaded Jan 14, 2026 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

metalcore-0.1.16-cp314-cp314-macosx_15_0_arm64.whl (1.2 MB view details)

Uploaded Jan 14, 2026 CPython 3.14macOS 15.0+ ARM64

metalcore-0.1.16-cp313-cp313-macosx_15_0_arm64.whl (1.2 MB view details)

Uploaded Jan 14, 2026 CPython 3.13macOS 15.0+ ARM64

metalcore-0.1.16-cp312-cp312-macosx_15_0_arm64.whl (1.2 MB view details)

Uploaded Jan 14, 2026 CPython 3.12macOS 15.0+ ARM64

metalcore-0.1.16-cp311-cp311-macosx_15_0_arm64.whl (1.2 MB view details)

Uploaded Jan 14, 2026 CPython 3.11macOS 15.0+ ARM64

metalcore-0.1.16-cp310-cp310-macosx_15_0_arm64.whl (1.2 MB view details)

Uploaded Jan 14, 2026 CPython 3.10macOS 15.0+ ARM64

metalcore-0.1.16-cp39-cp39-macosx_15_0_arm64.whl (1.2 MB view details)

Uploaded Jan 14, 2026 CPython 3.9macOS 15.0+ ARM64

File details

Details for the file metalcore-0.1.16.tar.gz.

File metadata

Download URL: metalcore-0.1.16.tar.gz
Upload date: Jan 14, 2026
Size: 2.3 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for metalcore-0.1.16.tar.gz
Algorithm	Hash digest
SHA256	`86ee276ffc2d47872e7f157f58b95f5e5e70cf1f2b69a7d610692fe629caecab`
MD5	`f5068de692a3101a4dbf7fefc5588990`
BLAKE2b-256	`c18bf26150e20f208c658829bbcd92001bdcb0cfa5cfb55a561e9dc924d9ce26`

See more details on using hashes here.

File details

Details for the file metalcore-0.1.16-cp314-cp314-macosx_15_0_arm64.whl.

File metadata

Download URL: metalcore-0.1.16-cp314-cp314-macosx_15_0_arm64.whl
Upload date: Jan 14, 2026
Size: 1.2 MB
Tags: CPython 3.14, macOS 15.0+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for metalcore-0.1.16-cp314-cp314-macosx_15_0_arm64.whl
Algorithm	Hash digest
SHA256	`412ae0da642d83d6733bd5273e5c3a7b498d81421520cc3d144018265200d3a9`
MD5	`967873fdba7da42a5274337523bcb84a`
BLAKE2b-256	`4c444876b7ef59c7e467c43f76255088bd9175b8c2fc09c253285370104b68bc`

See more details on using hashes here.

File details

Details for the file metalcore-0.1.16-cp313-cp313-macosx_15_0_arm64.whl.

File metadata

Download URL: metalcore-0.1.16-cp313-cp313-macosx_15_0_arm64.whl
Upload date: Jan 14, 2026
Size: 1.2 MB
Tags: CPython 3.13, macOS 15.0+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for metalcore-0.1.16-cp313-cp313-macosx_15_0_arm64.whl
Algorithm	Hash digest
SHA256	`d1aab917c3a1fcc3d8961d2c8683a57595fd9da1fa00e8708fecd199625226e2`
MD5	`0bd74951550fa22997adc3d872c2b85f`
BLAKE2b-256	`ba255e2ea5a0f78d9ba717e9cee0cd5dd9a6c14252582e7b6cc026eb2463ef78`

See more details on using hashes here.

File details

Details for the file metalcore-0.1.16-cp312-cp312-macosx_15_0_arm64.whl.

File metadata

Download URL: metalcore-0.1.16-cp312-cp312-macosx_15_0_arm64.whl
Upload date: Jan 14, 2026
Size: 1.2 MB
Tags: CPython 3.12, macOS 15.0+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for metalcore-0.1.16-cp312-cp312-macosx_15_0_arm64.whl
Algorithm	Hash digest
SHA256	`6fff1c94f2c5ea9ec72410893fec2bdc5a907384911c1ed22bf16b29348c1d08`
MD5	`9615aca3876a4bb3d1a960740c54ea42`
BLAKE2b-256	`20c23fa58e856d4d6f4a55124403a1f7080376b637ce65f21b9fc19b4a9e3f1f`

See more details on using hashes here.

File details

Details for the file metalcore-0.1.16-cp311-cp311-macosx_15_0_arm64.whl.

File metadata

Download URL: metalcore-0.1.16-cp311-cp311-macosx_15_0_arm64.whl
Upload date: Jan 14, 2026
Size: 1.2 MB
Tags: CPython 3.11, macOS 15.0+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for metalcore-0.1.16-cp311-cp311-macosx_15_0_arm64.whl
Algorithm	Hash digest
SHA256	`ca012b74861aa68fdabff372163be12bf4f0d3f7500dbbb913eeb968c9a87c7f`
MD5	`fa3ee8442bb01a98fb3c9f7cc3dfb0d7`
BLAKE2b-256	`de101076e55f2793a46091c2e2c85d4ecafc918d46101214169125dac8ccea80`

See more details on using hashes here.

File details

Details for the file metalcore-0.1.16-cp310-cp310-macosx_15_0_arm64.whl.

File metadata

Download URL: metalcore-0.1.16-cp310-cp310-macosx_15_0_arm64.whl
Upload date: Jan 14, 2026
Size: 1.2 MB
Tags: CPython 3.10, macOS 15.0+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for metalcore-0.1.16-cp310-cp310-macosx_15_0_arm64.whl
Algorithm	Hash digest
SHA256	`5a608e8783cf16fde8bc6231adebce1552991c807a2dfcd8d0d72b20bde76b4e`
MD5	`b3f88fb3b53ba3e3427b6e1580a17633`
BLAKE2b-256	`4362148e34fafb6c93afbc6bb00c24f6ae8b9eb6079aaaede2be0c27f0ac6e6e`

See more details on using hashes here.

File details

Details for the file metalcore-0.1.16-cp39-cp39-macosx_15_0_arm64.whl.

File metadata

Download URL: metalcore-0.1.16-cp39-cp39-macosx_15_0_arm64.whl
Upload date: Jan 14, 2026
Size: 1.2 MB
Tags: CPython 3.9, macOS 15.0+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for metalcore-0.1.16-cp39-cp39-macosx_15_0_arm64.whl
Algorithm	Hash digest
SHA256	`be2e7b43d3190cc6ec9f78031e2b58736afc554153021f0df3ce30f2bbb83e2c`
MD5	`e5dca136258d6a4f1c2d60ed5fcadc54`
BLAKE2b-256	`4802f872a429dbde9f881cf0dbe9211d1c6884edc5e2612f62b5d095ce7c1e6e`

See more details on using hashes here.

metalcore 0.1.16

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

metalcore

Overview

Installation

Key Features

Linear Algebra

Training Ops

RoPE

INT4 Quantization

PyTorch Integration

Quick Start

Performance Highlights

Requirements

Author

License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes