Skip to main content

Foundational Metal Linear Algebra Primitives for PyTorch

Project description

metalcore

Foundational Metal Linear Algebra Primitives for PyTorch on Apple Silicon.

Overview

metalcore provides a unified backend for high-performance linear algebra operations on macOS devices, bypassing generic MPS fallbacks to use optimized custom Metal kernels.

Supported Operations

1. Decompositions

  • SVD (svd): One-sided Jacobi algorithm. Highly optimized for both batched small matrices and large "tall" matrices (e.g., LLM weights).
  • QR (qr, qr_batched): Blocked Householder reflection. Significantly faster for batched operations.
  • Eigh (eigh): Symmetric eigenvalue decomposition using Jacobi rotations.
  • Cholesky (cholesky): MAGMA-style shared memory optimization for Positive Definite matrices.

2. Solvers

  • Linear Solve (solve): Batched linear system solver using LU factorization. Supports fp16/bf16 (auto-promoted to fp32 for stability).
  • Triangular Solve (trsm): Solve $AX=B$ where $A$ is triangular.

3. Training Ops ⚡ NEW

  • RMSNorm (MetalRMSNorm): Fused RMS normalization with 2.5x speedup over PyTorch.
  • AdamW (MetalAdamW): Fused optimizer step with 2.9x speedup.
  • Activations (metal_gelu, metal_silu): Vectorized float4 GELU/SiLU with fast backward pass.
  • SDPA (metal_scaled_dot_product_attention): Flash Attention v2 with tiling and causal masking (experimental).

4. Primitives

  • Householder Reflections: Core orthogonalization primitives (geqr2, larft, larfb).

Installation

pip install metalcore

Usage

import torch
import metalcore

device = 'mps'

# SVD
A = torch.randn(100, 50, device=device)
U, S, V = metalcore.svd(A)

# Batched QR
B = torch.randn(100, 16, 16, device=device)
Q, R = metalcore.qr(B)

# Cholesky
C = torch.randn(10, 32, 32, device=device)
C = C @ C.mT + 1e-4 * torch.eye(32, device=device)  # Make PD
L = metalcore.cholesky(C)

# Linear Solve (batched, supports fp16/bf16)
A = torch.randn(100, 32, 32, device=device)
b = torch.randn(100, 32, device=device)
x = metalcore.solve(A, b)  # x such that A @ x = b

# Training Ops
from metalcore import MetalRMSNorm, MetalAdamW, metal_gelu

# RMSNorm (2.5x faster)
norm = MetalRMSNorm(512).to(device)
x = torch.randn(32, 128, 512, device=device)
y = norm(x)

# AdamW (2.9x faster)
model = torch.nn.Linear(512, 256).to(device)
optimizer = MetalAdamW(model.parameters(), lr=1e-3)

# GELU activation
y = metal_gelu(x)

Performance Highlights

Operation Speedup vs PyTorch/CPU
Cholesky Batched 10x faster
Solve Batched 5-10x faster
QR Batched 20x faster
RMSNorm 2.5x faster
AdamW 2.9x faster
SiLU/GELU 2-4x faster

Requirements

  • macOS 12.0+ with Apple Silicon (M1/M2/M3/M4)
  • Python 3.9+
  • PyTorch 2.0+

Author

Kris Bailey

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

metalcore-0.1.8-cp314-cp314-macosx_15_0_arm64.whl (658.8 kB view details)

Uploaded CPython 3.14macOS 15.0+ ARM64

metalcore-0.1.8-cp313-cp313-macosx_15_0_arm64.whl (659.0 kB view details)

Uploaded CPython 3.13macOS 15.0+ ARM64

metalcore-0.1.8-cp312-cp312-macosx_15_0_arm64.whl (659.0 kB view details)

Uploaded CPython 3.12macOS 15.0+ ARM64

metalcore-0.1.8-cp311-cp311-macosx_15_0_arm64.whl (658.1 kB view details)

Uploaded CPython 3.11macOS 15.0+ ARM64

metalcore-0.1.8-cp310-cp310-macosx_15_0_arm64.whl (656.4 kB view details)

Uploaded CPython 3.10macOS 15.0+ ARM64

metalcore-0.1.8-cp39-cp39-macosx_15_0_arm64.whl (650.9 kB view details)

Uploaded CPython 3.9macOS 15.0+ ARM64

File details

Details for the file metalcore-0.1.8-cp314-cp314-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for metalcore-0.1.8-cp314-cp314-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 77bda97ab008657e8e0f7b9acd902c15ba20e55ce1ac74860d8c3a67d336ea8d
MD5 8c3ead135f23fcc060f334fc1b9037b9
BLAKE2b-256 d6827d266e2310561cad4c9245676b49caf1208372099a2c348f6e5e5f5aab54

See more details on using hashes here.

File details

Details for the file metalcore-0.1.8-cp313-cp313-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for metalcore-0.1.8-cp313-cp313-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 2ff5dba2daaa73785ea12c1aec483c00da84efc59bd707a660fb91584c8110a1
MD5 550a2adfe0ab97f7f1a65960608b1d00
BLAKE2b-256 4120295b057d2a6b205c170f51af1d87029385031059ea76efdb1cdf5a564bf8

See more details on using hashes here.

File details

Details for the file metalcore-0.1.8-cp312-cp312-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for metalcore-0.1.8-cp312-cp312-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 9fa64a6ca069107f09d62a35df18d995b2a6e178beff6ff895df7f31fa2ce446
MD5 06fd14c5deaf932221df7434d660b836
BLAKE2b-256 bd18021c85a39736c55ff02b82e7b764f584df70594a628076e8bd5cdc568965

See more details on using hashes here.

File details

Details for the file metalcore-0.1.8-cp311-cp311-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for metalcore-0.1.8-cp311-cp311-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 396278d8d59806396f83426e24da125e2eac783e330e945e2ae9fea200c43934
MD5 b1b479f85aa12c0ecab148f14300bb69
BLAKE2b-256 225424a4ae1423577af2907f8e88fa877e16b50ec2c87141b625b033609ec0b4

See more details on using hashes here.

File details

Details for the file metalcore-0.1.8-cp310-cp310-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for metalcore-0.1.8-cp310-cp310-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 fc2a0115e5e449d857342462e731848e773a9d3f41026c9876e63a93e86a7457
MD5 3e53fdc9a41128fd09c9e15374ace96d
BLAKE2b-256 4f7cbefc2d53e8094629834fb511a1187473f1e98dd5a9357598ed1f65a44070

See more details on using hashes here.

File details

Details for the file metalcore-0.1.8-cp39-cp39-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for metalcore-0.1.8-cp39-cp39-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 8406876039d09f4444f920f2f8e6c15fbda3099075256f7ac7ac330678a5e332
MD5 fdc3488a22baa196573e1b9846a1928d
BLAKE2b-256 1f29b85a185db332383466c711fdf3f2965847ebacb8d0a2ef6e2678e4be5d1c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page