Foundational Metal Linear Algebra Primitives for PyTorch
Project description
metalcore
Foundational Metal Linear Algebra Primitives for PyTorch on Apple Silicon.
Overview
metalcore provides a unified backend for high-performance linear algebra operations on macOS devices, bypassing generic MPS fallbacks to use optimized custom Metal kernels.
Supported Operations
1. Decompositions
- SVD (
svd): One-sided Jacobi algorithm. Highly optimized for both batched small matrices and large "tall" matrices (e.g., LLM weights). - QR (
qr,qr_batched): Blocked Householder reflection. Significantly faster for batched operations. - Eigh (
eigh): Symmetric eigenvalue decomposition using Jacobi rotations. - Cholesky (
cholesky): MAGMA-style shared memory optimization for Positive Definite matrices.
2. Solvers
- Linear Solve (
solve): Batched linear system solver using LU factorization. Supports fp16/bf16 (auto-promoted to fp32 for stability). - Triangular Solve (
trsm): Solve $AX=B$ where $A$ is triangular.
3. Training Ops ⚡ NEW
- RMSNorm (
MetalRMSNorm): Fused RMS normalization with 2.5x speedup over PyTorch. - AdamW (
MetalAdamW): Fused optimizer step with 2.9x speedup. - Activations (
metal_gelu,metal_silu): Vectorized float4 GELU/SiLU with fast backward pass. - SDPA (
metal_scaled_dot_product_attention): Flash Attention v2 with tiling and causal masking (experimental).
4. Primitives
- Householder Reflections: Core orthogonalization primitives (
geqr2,larft,larfb).
Installation
pip install metalcore
Usage
import torch
import metalcore
device = 'mps'
# SVD
A = torch.randn(100, 50, device=device)
U, S, V = metalcore.svd(A)
# Batched QR
B = torch.randn(100, 16, 16, device=device)
Q, R = metalcore.qr(B)
# Cholesky
C = torch.randn(10, 32, 32, device=device)
C = C @ C.mT + 1e-4 * torch.eye(32, device=device) # Make PD
L = metalcore.cholesky(C)
# Linear Solve (batched, supports fp16/bf16)
A = torch.randn(100, 32, 32, device=device)
b = torch.randn(100, 32, device=device)
x = metalcore.solve(A, b) # x such that A @ x = b
# Training Ops
from metalcore import MetalRMSNorm, MetalAdamW, metal_gelu
# RMSNorm (2.5x faster)
norm = MetalRMSNorm(512).to(device)
x = torch.randn(32, 128, 512, device=device)
y = norm(x)
# AdamW (2.9x faster)
model = torch.nn.Linear(512, 256).to(device)
optimizer = MetalAdamW(model.parameters(), lr=1e-3)
# GELU activation
y = metal_gelu(x)
Performance Highlights
| Operation | Speedup vs PyTorch/CPU |
|---|---|
| Cholesky Batched | 10x faster |
| Solve Batched | 5-10x faster |
| QR Batched | 20x faster |
| RMSNorm | 2.5x faster |
| AdamW | 2.9x faster |
| SiLU/GELU | 2-4x faster |
Requirements
- macOS 12.0+ with Apple Silicon (M1/M2/M3/M4)
- Python 3.9+
- PyTorch 2.0+
Author
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file metalcore-0.1.8-cp314-cp314-macosx_15_0_arm64.whl.
File metadata
- Download URL: metalcore-0.1.8-cp314-cp314-macosx_15_0_arm64.whl
- Upload date:
- Size: 658.8 kB
- Tags: CPython 3.14, macOS 15.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
77bda97ab008657e8e0f7b9acd902c15ba20e55ce1ac74860d8c3a67d336ea8d
|
|
| MD5 |
8c3ead135f23fcc060f334fc1b9037b9
|
|
| BLAKE2b-256 |
d6827d266e2310561cad4c9245676b49caf1208372099a2c348f6e5e5f5aab54
|
File details
Details for the file metalcore-0.1.8-cp313-cp313-macosx_15_0_arm64.whl.
File metadata
- Download URL: metalcore-0.1.8-cp313-cp313-macosx_15_0_arm64.whl
- Upload date:
- Size: 659.0 kB
- Tags: CPython 3.13, macOS 15.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2ff5dba2daaa73785ea12c1aec483c00da84efc59bd707a660fb91584c8110a1
|
|
| MD5 |
550a2adfe0ab97f7f1a65960608b1d00
|
|
| BLAKE2b-256 |
4120295b057d2a6b205c170f51af1d87029385031059ea76efdb1cdf5a564bf8
|
File details
Details for the file metalcore-0.1.8-cp312-cp312-macosx_15_0_arm64.whl.
File metadata
- Download URL: metalcore-0.1.8-cp312-cp312-macosx_15_0_arm64.whl
- Upload date:
- Size: 659.0 kB
- Tags: CPython 3.12, macOS 15.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9fa64a6ca069107f09d62a35df18d995b2a6e178beff6ff895df7f31fa2ce446
|
|
| MD5 |
06fd14c5deaf932221df7434d660b836
|
|
| BLAKE2b-256 |
bd18021c85a39736c55ff02b82e7b764f584df70594a628076e8bd5cdc568965
|
File details
Details for the file metalcore-0.1.8-cp311-cp311-macosx_15_0_arm64.whl.
File metadata
- Download URL: metalcore-0.1.8-cp311-cp311-macosx_15_0_arm64.whl
- Upload date:
- Size: 658.1 kB
- Tags: CPython 3.11, macOS 15.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
396278d8d59806396f83426e24da125e2eac783e330e945e2ae9fea200c43934
|
|
| MD5 |
b1b479f85aa12c0ecab148f14300bb69
|
|
| BLAKE2b-256 |
225424a4ae1423577af2907f8e88fa877e16b50ec2c87141b625b033609ec0b4
|
File details
Details for the file metalcore-0.1.8-cp310-cp310-macosx_15_0_arm64.whl.
File metadata
- Download URL: metalcore-0.1.8-cp310-cp310-macosx_15_0_arm64.whl
- Upload date:
- Size: 656.4 kB
- Tags: CPython 3.10, macOS 15.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fc2a0115e5e449d857342462e731848e773a9d3f41026c9876e63a93e86a7457
|
|
| MD5 |
3e53fdc9a41128fd09c9e15374ace96d
|
|
| BLAKE2b-256 |
4f7cbefc2d53e8094629834fb511a1187473f1e98dd5a9357598ed1f65a44070
|
File details
Details for the file metalcore-0.1.8-cp39-cp39-macosx_15_0_arm64.whl.
File metadata
- Download URL: metalcore-0.1.8-cp39-cp39-macosx_15_0_arm64.whl
- Upload date:
- Size: 650.9 kB
- Tags: CPython 3.9, macOS 15.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8406876039d09f4444f920f2f8e6c15fbda3099075256f7ac7ac330678a5e332
|
|
| MD5 |
fdc3488a22baa196573e1b9846a1928d
|
|
| BLAKE2b-256 |
1f29b85a185db332383466c711fdf3f2965847ebacb8d0a2ef6e2678e4be5d1c
|