Skip to main content

Foundational Metal Linear Algebra Primitives for PyTorch

Project description

metalcore

Foundational Metal Linear Algebra Primitives for PyTorch on Apple Silicon.

Overview

metalcore provides a unified backend for high-performance linear algebra operations on macOS devices, bypassing generic MPS fallbacks to use optimized custom Metal kernels.

Supported Operations

1. Decompositions

  • SVD (svd): One-sided Jacobi algorithm. Highly optimized for both batched small matrices and large "tall" matrices (e.g., LLM weights).
  • QR (qr, qr_batched): Blocked Householder reflection. Significantly faster for batched operations.
  • Eigh (eigh): Symmetric eigenvalue decomposition using Jacobi rotations.
  • Cholesky (cholesky): MAGMA-style shared memory optimization for Positive Definite matrices.

2. Solvers

  • Linear Solve (solve): Batched linear system solver using LU factorization. Supports fp16/bf16 (auto-promoted to fp32 for stability).
  • Triangular Solve (trsm): Solve $AX=B$ where $A$ is triangular.

3. Training Ops ⚡ NEW

  • RMSNorm (MetalRMSNorm): Fused RMS normalization with 2.5x speedup over PyTorch.
  • AdamW (MetalAdamW): Fused optimizer step with 2.9x speedup.
  • Activations (metal_gelu, metal_silu): Vectorized float4 GELU/SiLU with fast backward pass.
  • SDPA (metal_scaled_dot_product_attention): Flash Attention v2 with tiling and causal masking (experimental).

4. Primitives

  • Householder Reflections: Core orthogonalization primitives (geqr2, larft, larfb).

Installation

pip install metalcore

Usage

import torch
import metalcore

device = 'mps'

# SVD
A = torch.randn(100, 50, device=device)
U, S, V = metalcore.svd(A)

# Batched QR
B = torch.randn(100, 16, 16, device=device)
Q, R = metalcore.qr(B)

# Cholesky
C = torch.randn(10, 32, 32, device=device)
C = C @ C.mT + 1e-4 * torch.eye(32, device=device)  # Make PD
L = metalcore.cholesky(C)

# Linear Solve (batched, supports fp16/bf16)
A = torch.randn(100, 32, 32, device=device)
b = torch.randn(100, 32, device=device)
x = metalcore.solve(A, b)  # x such that A @ x = b

# Training Ops
from metalcore import MetalRMSNorm, MetalAdamW, metal_gelu

# RMSNorm (2.5x faster)
norm = MetalRMSNorm(512).to(device)
x = torch.randn(32, 128, 512, device=device)
y = norm(x)

# AdamW (2.9x faster)
model = torch.nn.Linear(512, 256).to(device)
optimizer = MetalAdamW(model.parameters(), lr=1e-3)

# GELU activation
y = metal_gelu(x)

Performance Highlights

Operation Speedup vs PyTorch/CPU
Cholesky Batched 10x faster
Solve Batched 5-10x faster
QR Batched 20x faster
RMSNorm 2.5x faster
AdamW 2.9x faster
SiLU/GELU 2-4x faster

Requirements

  • macOS 12.0+ with Apple Silicon (M1/M2/M3/M4)
  • Python 3.9+
  • PyTorch 2.0+

Author

Kris Bailey

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

metalcore-0.1.7-cp314-cp314-macosx_15_0_arm64.whl (647.7 kB view details)

Uploaded CPython 3.14macOS 15.0+ ARM64

metalcore-0.1.7-cp313-cp313-macosx_15_0_arm64.whl (647.4 kB view details)

Uploaded CPython 3.13macOS 15.0+ ARM64

metalcore-0.1.7-cp312-cp312-macosx_15_0_arm64.whl (647.9 kB view details)

Uploaded CPython 3.12macOS 15.0+ ARM64

metalcore-0.1.7-cp311-cp311-macosx_15_0_arm64.whl (646.6 kB view details)

Uploaded CPython 3.11macOS 15.0+ ARM64

metalcore-0.1.7-cp310-cp310-macosx_15_0_arm64.whl (645.3 kB view details)

Uploaded CPython 3.10macOS 15.0+ ARM64

metalcore-0.1.7-cp39-cp39-macosx_15_0_arm64.whl (638.1 kB view details)

Uploaded CPython 3.9macOS 15.0+ ARM64

File details

Details for the file metalcore-0.1.7-cp314-cp314-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for metalcore-0.1.7-cp314-cp314-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 919a77c5bdfd5135aa45da8173015860b088d7e5e021fe0aab7bc26d3480b6e4
MD5 b8b77068a43c82676176c6190e9fec8c
BLAKE2b-256 535c4dd0b79ce70214e635f27c44d2d5437146d3368e2e39295a4bd7a0d5b236

See more details on using hashes here.

File details

Details for the file metalcore-0.1.7-cp313-cp313-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for metalcore-0.1.7-cp313-cp313-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 52b7658caa533bf4c350a6d9964fd155104a29a9e1d3da7b2af339f65b735dcd
MD5 adb7c734ecb71a92302858510bc3c3f2
BLAKE2b-256 bc0b33fbf0e345d9d712baf6ccfe41022fcdea0be54538496047e42285b46c35

See more details on using hashes here.

File details

Details for the file metalcore-0.1.7-cp312-cp312-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for metalcore-0.1.7-cp312-cp312-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 d891af7b17e4323fc164f1671e69b593dbf870c0ffe5600af1938be54e5c715b
MD5 9c6a248c5082636cad9c0c7d4740f1db
BLAKE2b-256 c7c7c8c3d47795a6685fa200cba4d707933dae48bc14dfef504ed437f2241e33

See more details on using hashes here.

File details

Details for the file metalcore-0.1.7-cp311-cp311-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for metalcore-0.1.7-cp311-cp311-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 b347626553dd768445c6085bd5e99c7c4b9b5272d60c57c39f1418f2e99b7eb7
MD5 c8f6857462cc3c60bc4354a12c8612c5
BLAKE2b-256 017b8391e80df9664c11b3f634fb92d10c2ee5143f28dbad0d5ab1f5348fed23

See more details on using hashes here.

File details

Details for the file metalcore-0.1.7-cp310-cp310-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for metalcore-0.1.7-cp310-cp310-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 c52d790a7d703db7e7860116e70096614a99606d7943a94b13312dd4720c9c21
MD5 8d1d1960e9e65fd680c3b4916154fa62
BLAKE2b-256 452110e98d2a4f3d77325c4b6f9044405430b3129b46347dc6633c3f9d3c11d9

See more details on using hashes here.

File details

Details for the file metalcore-0.1.7-cp39-cp39-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for metalcore-0.1.7-cp39-cp39-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 e8e2aa794a506d7033b6eb0ffb54ba56cbf90bbf9d5237202aa7c87fea0bc6e6
MD5 87781b02bf01a2ae6fb344b989a0389b
BLAKE2b-256 ed3a58f7a394cc9c06182d490af8e46726ccb1bfe34d87ce48b04e1beaa3361a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page