Skip to main content

Foundational Metal Linear Algebra Primitives for PyTorch

Project description

metalcore

Foundational Metal Linear Algebra Primitives for PyTorch on Apple Silicon.

Overview

metalcore provides a unified backend for high-performance linear algebra operations on macOS devices, bypassing generic MPS fallbacks to use optimized custom Metal kernels.

Supported Operations

1. Decompositions

  • SVD (svd): One-sided Jacobi algorithm. Highly optimized for both batched small matrices and large "tall" matrices (e.g., LLM weights).
  • QR (qr, qr_batched): Blocked Householder reflection. Significantly faster for batched operations.
  • Eigh (eigh): Symmetric eigenvalue decomposition using Jacobi rotations.
  • Cholesky (cholesky): MAGMA-style shared memory optimization for Positive Definite matrices.

2. Solvers

  • Linear Solve (solve): Batched linear system solver using LU factorization. Supports fp16/bf16 (auto-promoted to fp32 for stability).
  • Triangular Solve (trsm): Solve $AX=B$ where $A$ is triangular.

3. Training Ops ⚡ NEW

  • RMSNorm (MetalRMSNorm): Fused RMS normalization with 2.5x speedup over PyTorch.
  • AdamW (MetalAdamW): Fused optimizer step with 2.9x speedup.
  • Activations (metal_gelu, metal_silu): Vectorized float4 GELU/SiLU with fast backward pass.
  • SDPA (metal_scaled_dot_product_attention): Flash Attention v2 with tiling and causal masking (experimental).

4. Primitives

  • Householder Reflections: Core orthogonalization primitives (geqr2, larft, larfb).

Installation

pip install metalcore

Usage

import torch
import metalcore

device = 'mps'

# SVD
A = torch.randn(100, 50, device=device)
U, S, V = metalcore.svd(A)

# Batched QR
B = torch.randn(100, 16, 16, device=device)
Q, R = metalcore.qr(B)

# Cholesky
C = torch.randn(10, 32, 32, device=device)
C = C @ C.mT + 1e-4 * torch.eye(32, device=device)  # Make PD
L = metalcore.cholesky(C)

# Linear Solve (batched, supports fp16/bf16)
A = torch.randn(100, 32, 32, device=device)
b = torch.randn(100, 32, device=device)
x = metalcore.solve(A, b)  # x such that A @ x = b

# Training Ops
from metalcore import MetalRMSNorm, MetalAdamW, metal_gelu

# RMSNorm (2.5x faster)
norm = MetalRMSNorm(512).to(device)
x = torch.randn(32, 128, 512, device=device)
y = norm(x)

# AdamW (2.9x faster)
model = torch.nn.Linear(512, 256).to(device)
optimizer = MetalAdamW(model.parameters(), lr=1e-3)

# GELU activation
y = metal_gelu(x)

Performance Highlights

Operation Speedup vs PyTorch/CPU
Cholesky Batched 10x faster
Solve Batched 5-10x faster
QR Batched 20x faster
RMSNorm 2.5x faster
AdamW 2.9x faster
SiLU/GELU 2-4x faster

Requirements

  • macOS 12.0+ with Apple Silicon (M1/M2/M3/M4)
  • Python 3.9+
  • PyTorch 2.0+

Author

Kris Bailey

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

metalcore-0.1.13-cp314-cp314-macosx_15_0_arm64.whl (772.2 kB view details)

Uploaded CPython 3.14macOS 15.0+ ARM64

metalcore-0.1.13-cp313-cp313-macosx_15_0_arm64.whl (776.4 kB view details)

Uploaded CPython 3.13macOS 15.0+ ARM64

metalcore-0.1.13-cp312-cp312-macosx_15_0_arm64.whl (774.5 kB view details)

Uploaded CPython 3.12macOS 15.0+ ARM64

metalcore-0.1.13-cp311-cp311-macosx_15_0_arm64.whl (772.9 kB view details)

Uploaded CPython 3.11macOS 15.0+ ARM64

metalcore-0.1.13-cp310-cp310-macosx_15_0_arm64.whl (770.6 kB view details)

Uploaded CPython 3.10macOS 15.0+ ARM64

metalcore-0.1.13-cp39-cp39-macosx_15_0_arm64.whl (768.7 kB view details)

Uploaded CPython 3.9macOS 15.0+ ARM64

File details

Details for the file metalcore-0.1.13-cp314-cp314-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for metalcore-0.1.13-cp314-cp314-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 d376b6a53c1aff06e8c50bca8ecb8f5260e1c055d145614137e8909dc3465e2d
MD5 a90389d932cd1d55515ab51e75922782
BLAKE2b-256 1ec03043da324a86882fb6e709ee992e4ecff6c84c27acb30c9d943e32ad7bd2

See more details on using hashes here.

File details

Details for the file metalcore-0.1.13-cp313-cp313-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for metalcore-0.1.13-cp313-cp313-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 403df37c48b2221668fede8e50820e7d1fac0e306bfe36e160adf7a57c39a07c
MD5 fe418099b32ee95933373e902f9f25e7
BLAKE2b-256 3d43a1a3ee041438a81cac1c5e1f967a273d4ab60a8b7eedbbf8e4701cd48b1e

See more details on using hashes here.

File details

Details for the file metalcore-0.1.13-cp312-cp312-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for metalcore-0.1.13-cp312-cp312-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 a6355003f0e2e5eb5e6f4fd07968a69eaba03a9c7fb7517b4a27e162666a2386
MD5 a4852925ebac2d945584ef7327936b1b
BLAKE2b-256 92af61cacd319bdcf0596a9459c34025970546ee184de2c02c4f0c4ef1964a35

See more details on using hashes here.

File details

Details for the file metalcore-0.1.13-cp311-cp311-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for metalcore-0.1.13-cp311-cp311-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 58a948d59c28be7815c576bb9aff459912d18952d67b8e66a938a32bcea834bb
MD5 0eb95f69ff0caf0cd8d4ce7c4fd53f33
BLAKE2b-256 971a5c1066886623432d504db65ed7b78786ded84abbf75121816bb81c878415

See more details on using hashes here.

File details

Details for the file metalcore-0.1.13-cp310-cp310-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for metalcore-0.1.13-cp310-cp310-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 3b7444f41044fc74d1c93d9f4c2d9135074175d6157ea502df11908afd694b4e
MD5 cef329f1a4878c13e19c65661a2d52d8
BLAKE2b-256 d945c57674d3b80f2ff6967db6fcc5ac20681e1b77dba088ce1f6cac3f93d14c

See more details on using hashes here.

File details

Details for the file metalcore-0.1.13-cp39-cp39-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for metalcore-0.1.13-cp39-cp39-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 a1be9c820ce3fe08e0c7d6d20ff4c66019d5d2c7b74bb4104515d78935895425
MD5 444077a15fd4037f97fceafb43596d70
BLAKE2b-256 ccad7d153141b1ac6173a55c5469b4e436eed9b9a044e8b5e176b219b1e84d1c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page