Skip to main content

Foundational Metal Linear Algebra Primitives for PyTorch

Project description

metalcore

Foundational Metal Linear Algebra Primitives for PyTorch on Apple Silicon.

Overview

metalcore provides a unified backend for high-performance linear algebra operations on macOS devices, bypassing generic MPS fallbacks to use optimized custom Metal kernels.

Supported Operations

1. Decompositions

  • SVD (svd): One-sided Jacobi algorithm. Highly optimized for both batched small matrices and large "tall" matrices (e.g., LLM weights).
  • QR (qr, qr_batched): Blocked Householder reflection. Significantly faster for batched operations.
  • Eigh (eigh): Symmetric eigenvalue decomposition using Jacobi rotations.
  • Cholesky (cholesky): MAGMA-style shared memory optimization for Positive Definite matrices.

2. Solvers

  • Linear Solve (solve): Batched linear system solver using LU factorization. Supports fp16/bf16 (auto-promoted to fp32 for stability).
  • Triangular Solve (trsm): Solve $AX=B$ where $A$ is triangular.

3. Training Ops ⚡ NEW

  • RMSNorm (MetalRMSNorm): Fused RMS normalization with 2.5x speedup over PyTorch.
  • AdamW (MetalAdamW): Fused optimizer step with 2.9x speedup.
  • Activations (metal_gelu, metal_silu): Vectorized float4 GELU/SiLU with fast backward pass.
  • SDPA (metal_scaled_dot_product_attention): Flash Attention v2 with tiling and causal masking (experimental).

4. Primitives

  • Householder Reflections: Core orthogonalization primitives (geqr2, larft, larfb).

Installation

pip install metalcore

Usage

import torch
import metalcore

device = 'mps'

# SVD
A = torch.randn(100, 50, device=device)
U, S, V = metalcore.svd(A)

# Batched QR
B = torch.randn(100, 16, 16, device=device)
Q, R = metalcore.qr(B)

# Cholesky
C = torch.randn(10, 32, 32, device=device)
C = C @ C.mT + 1e-4 * torch.eye(32, device=device)  # Make PD
L = metalcore.cholesky(C)

# Linear Solve (batched, supports fp16/bf16)
A = torch.randn(100, 32, 32, device=device)
b = torch.randn(100, 32, device=device)
x = metalcore.solve(A, b)  # x such that A @ x = b

# Training Ops
from metalcore import MetalRMSNorm, MetalAdamW, metal_gelu

# RMSNorm (2.5x faster)
norm = MetalRMSNorm(512).to(device)
x = torch.randn(32, 128, 512, device=device)
y = norm(x)

# AdamW (2.9x faster)
model = torch.nn.Linear(512, 256).to(device)
optimizer = MetalAdamW(model.parameters(), lr=1e-3)

# GELU activation
y = metal_gelu(x)

Performance Highlights

Operation Speedup vs PyTorch/CPU
Cholesky Batched 10x faster
Solve Batched 5-10x faster
QR Batched 20x faster
RMSNorm 2.5x faster
AdamW 2.9x faster
SiLU/GELU 2-4x faster

Requirements

  • macOS 12.0+ with Apple Silicon (M1/M2/M3/M4)
  • Python 3.9+
  • PyTorch 2.0+

Author

Kris Bailey

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

metalcore-0.1.11-cp314-cp314-macosx_15_0_arm64.whl (748.3 kB view details)

Uploaded CPython 3.14macOS 15.0+ ARM64

metalcore-0.1.11-cp313-cp313-macosx_15_0_arm64.whl (748.6 kB view details)

Uploaded CPython 3.13macOS 15.0+ ARM64

metalcore-0.1.11-cp312-cp312-macosx_15_0_arm64.whl (748.6 kB view details)

Uploaded CPython 3.12macOS 15.0+ ARM64

metalcore-0.1.11-cp311-cp311-macosx_15_0_arm64.whl (748.3 kB view details)

Uploaded CPython 3.11macOS 15.0+ ARM64

metalcore-0.1.11-cp310-cp310-macosx_15_0_arm64.whl (746.8 kB view details)

Uploaded CPython 3.10macOS 15.0+ ARM64

metalcore-0.1.11-cp39-cp39-macosx_15_0_arm64.whl (740.3 kB view details)

Uploaded CPython 3.9macOS 15.0+ ARM64

File details

Details for the file metalcore-0.1.11-cp314-cp314-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for metalcore-0.1.11-cp314-cp314-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 6eecd54488a33d61c733fb41e84b6cc9e5fb4abd3c401d139b1f93d48421410c
MD5 ed6998a412e1694f83d416fa4768d6c8
BLAKE2b-256 546455563701cc94f210c587393fcfcddd0f46d80474590e10b8009c3ecdaf7c

See more details on using hashes here.

File details

Details for the file metalcore-0.1.11-cp313-cp313-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for metalcore-0.1.11-cp313-cp313-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 9b27f5c2e7fce624708b2719d188defe00a7700f98044a48758042741ded9f92
MD5 288c7aa88b0b727389af86aecd74bf89
BLAKE2b-256 945cb97c2f41de4cfcb3c0465dc488a2ad22c80edc7504191090b66041f494a2

See more details on using hashes here.

File details

Details for the file metalcore-0.1.11-cp312-cp312-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for metalcore-0.1.11-cp312-cp312-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 89b63d53b35e1069b866ce6092c2a501cbe7d1c68915f4e65dc13108a6bfe07c
MD5 141a99c62c8ed1902bb273a4e84834c2
BLAKE2b-256 d76312c09e581f5ae052bd9124765aeb88b216d44bcf37fe84012614467a17c7

See more details on using hashes here.

File details

Details for the file metalcore-0.1.11-cp311-cp311-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for metalcore-0.1.11-cp311-cp311-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 502afa9326259326d61dd918e1f1e0211bca31d944806af5aa34f7c0e1c56e97
MD5 92da9f645232559225b0f3c9928ca11f
BLAKE2b-256 556a6cadf7d0b8109ba4673b839f3932d06145060e8d96167ec57c07ab539887

See more details on using hashes here.

File details

Details for the file metalcore-0.1.11-cp310-cp310-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for metalcore-0.1.11-cp310-cp310-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 60bcca0ecdee8eaa795a46cdf01f24c8a45e9dc27c7f6d07d068d3f90a644713
MD5 96c34120d23a1f448e1e0771e39a3a59
BLAKE2b-256 386f0a4331b07ecfbfcb78b4cc100cba830cda99aa9d3bd64d1ef58da1228557

See more details on using hashes here.

File details

Details for the file metalcore-0.1.11-cp39-cp39-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for metalcore-0.1.11-cp39-cp39-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 a659b9489b1fbac1b51315d33a37cf7ac27cfd44348c0ad90ca2f2a7107fb799
MD5 5fb72769b9d7f81e6d111dc6da7f1383
BLAKE2b-256 f2ed4a02a88b3a611ff6d9afd1909846e9ba92ef2d8da80024cef824b8745b09

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page