Skip to main content

Foundational Metal Linear Algebra Primitives for PyTorch

Project description

metalcore

Foundational Metal Linear Algebra Primitives for PyTorch on Apple Silicon.

Overview

metalcore provides a unified backend for high-performance linear algebra operations on macOS devices, bypassing generic MPS fallbacks to use optimized custom Metal kernels.

Supported Operations

1. Decompositions

  • SVD (svd): One-sided Jacobi algorithm. Highly optimized for both batched small matrices and large "tall" matrices (e.g., LLM weights).
  • QR (qr, qr_batched): Blocked Householder reflection. Significantly faster for batched operations.
  • Eigh (eigh): Symmetric eigenvalue decomposition using Jacobi rotations.
  • Cholesky (cholesky): MAGMA-style shared memory optimization for Positive Definite matrices.

2. Solvers

  • Linear Solve (solve): Batched linear system solver using LU factorization. Supports fp16/bf16 (auto-promoted to fp32 for stability).
  • Triangular Solve (trsm): Solve $AX=B$ where $A$ is triangular.

3. Training Ops ⚡ NEW

  • RMSNorm (MetalRMSNorm): Fused RMS normalization with 2.5x speedup over PyTorch.
  • AdamW (MetalAdamW): Fused optimizer step with 2.9x speedup.
  • Activations (metal_gelu, metal_silu): Vectorized float4 GELU/SiLU with fast backward pass.
  • SDPA (metal_scaled_dot_product_attention): Flash Attention v2 with tiling and causal masking (experimental).

4. Primitives

  • Householder Reflections: Core orthogonalization primitives (geqr2, larft, larfb).

Installation

pip install metalcore

Usage

import torch
import metalcore

device = 'mps'

# SVD
A = torch.randn(100, 50, device=device)
U, S, V = metalcore.svd(A)

# Batched QR
B = torch.randn(100, 16, 16, device=device)
Q, R = metalcore.qr(B)

# Cholesky
C = torch.randn(10, 32, 32, device=device)
C = C @ C.mT + 1e-4 * torch.eye(32, device=device)  # Make PD
L = metalcore.cholesky(C)

# Linear Solve (batched, supports fp16/bf16)
A = torch.randn(100, 32, 32, device=device)
b = torch.randn(100, 32, device=device)
x = metalcore.solve(A, b)  # x such that A @ x = b

# Training Ops
from metalcore import MetalRMSNorm, MetalAdamW, metal_gelu

# RMSNorm (2.5x faster)
norm = MetalRMSNorm(512).to(device)
x = torch.randn(32, 128, 512, device=device)
y = norm(x)

# AdamW (2.9x faster)
model = torch.nn.Linear(512, 256).to(device)
optimizer = MetalAdamW(model.parameters(), lr=1e-3)

# GELU activation
y = metal_gelu(x)

Performance Highlights

Operation Speedup vs PyTorch/CPU
Cholesky Batched 10x faster
Solve Batched 5-10x faster
QR Batched 20x faster
RMSNorm 2.5x faster
AdamW 2.9x faster
SiLU/GELU 2-4x faster

Requirements

  • macOS 12.0+ with Apple Silicon (M1/M2/M3/M4)
  • Python 3.9+
  • PyTorch 2.0+

Author

Kris Bailey

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

metalcore-0.1.9-cp314-cp314-macosx_15_0_arm64.whl (666.3 kB view details)

Uploaded CPython 3.14macOS 15.0+ ARM64

metalcore-0.1.9-cp313-cp313-macosx_15_0_arm64.whl (666.5 kB view details)

Uploaded CPython 3.13macOS 15.0+ ARM64

metalcore-0.1.9-cp312-cp312-macosx_15_0_arm64.whl (666.4 kB view details)

Uploaded CPython 3.12macOS 15.0+ ARM64

metalcore-0.1.9-cp311-cp311-macosx_15_0_arm64.whl (666.4 kB view details)

Uploaded CPython 3.11macOS 15.0+ ARM64

metalcore-0.1.9-cp310-cp310-macosx_15_0_arm64.whl (665.1 kB view details)

Uploaded CPython 3.10macOS 15.0+ ARM64

File details

Details for the file metalcore-0.1.9-cp314-cp314-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for metalcore-0.1.9-cp314-cp314-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 2f84cd32750eb98b3288f4abc8a0355adb4000d5afc2e4eec29b5a7f4fb7a583
MD5 31fce5fb1d3e0844a6dd2993969fd2a1
BLAKE2b-256 7ce1374ff1ed6dc38a8b5f9eea8c9d0b955a98c20d6b182e8a79557c5dd3dd9d

See more details on using hashes here.

File details

Details for the file metalcore-0.1.9-cp313-cp313-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for metalcore-0.1.9-cp313-cp313-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 9c71ef993165711d2f454b49976dc2bbbc0d3f09e9aef5d1ef04b8b47bc15f30
MD5 3986f4080703f81908558f6c0f161330
BLAKE2b-256 1f954406ebae2f57bd0a003112d42c5b4031f396d4e1b57dc63a7892b8d5bebf

See more details on using hashes here.

File details

Details for the file metalcore-0.1.9-cp312-cp312-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for metalcore-0.1.9-cp312-cp312-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 4e63c1ea0cb5100af817602aa362490492af1df46ef5015695035367ad0b64e7
MD5 789618d1e486c4c1e004791c309ce6a1
BLAKE2b-256 eaed31844f7f77a5fe7da88a844d8ed9e0601770110ef32832a88b0ac55d99c1

See more details on using hashes here.

File details

Details for the file metalcore-0.1.9-cp311-cp311-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for metalcore-0.1.9-cp311-cp311-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 46280435d132d96690b66177ac2dbdce829e04695bdc3d7c69e4b79b2d841a39
MD5 91d311c3f6897e12fa4a1be2d92d7579
BLAKE2b-256 fd151080f9c71fb915907e3b4d518f090dd4c83e3d418e6925917e8c3a6b7a87

See more details on using hashes here.

File details

Details for the file metalcore-0.1.9-cp310-cp310-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for metalcore-0.1.9-cp310-cp310-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 b0540a06743c98ba58aa154cd525c1582ee5247d14954391f626aba3aa2659ed
MD5 a2f07fa0dca51afef5bed2b3ecca8276
BLAKE2b-256 a4196dde807e2f75381a6b0b427ce34a0e157f6cd140c3ce53af348f9be30bd4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page