Skip to main content

Foundational Metal Linear Algebra Primitives for PyTorch

Project description

metalcore

Foundational Metal Linear Algebra Primitives for PyTorch on Apple Silicon.

Overview

metalcore provides a unified backend for high-performance linear algebra operations on macOS devices, bypassing generic MPS fallbacks to use optimized custom Metal kernels.

Supported Operations

1. Decompositions

  • SVD (svd): One-sided Jacobi algorithm. Highly optimized for both batched small matrices and large "tall" matrices (e.g., LLM weights).
  • QR (qr, qr_batched): Blocked Householder reflection. Significantly faster for batched operations.
  • Eigh (eigh): Symmetric eigenvalue decomposition using Jacobi rotations.
  • Cholesky (cholesky): MAGMA-style shared memory optimization for Positive Definite matrices.

2. Solvers

  • Linear Solve (solve): Batched linear system solver using QR factorization and triangular solve.
  • Triangular Solve (trsm): Solve $AX=B$ where $A$ is triangular.

3. Training Ops ⚡ NEW

  • RMSNorm (MetalRMSNorm): Fused RMS normalization with 2.5x speedup over PyTorch.
  • AdamW (MetalAdamW): Fused optimizer step with 2.9x speedup.
  • Activations (metal_gelu, metal_silu): Vectorized float4 GELU/SiLU with fast backward pass.
  • SDPA (metal_scaled_dot_product_attention): Flash Attention v2 with tiling and causal masking (experimental).

4. Primitives

  • Householder Reflections: Core orthogonalization primitives (geqr2, larft, larfb).

Installation

pip install metalcore

Usage

import torch
import metalcore

device = 'mps'

# SVD
A = torch.randn(100, 50, device=device)
U, S, V = metalcore.svd(A)

# Batched QR
B = torch.randn(100, 16, 16, device=device)
Q, R = metalcore.qr(B)

# Cholesky
C = torch.randn(10, 32, 32, device=device)
C = C @ C.mT + 1e-4 * torch.eye(32, device=device)  # Make PD
L = metalcore.cholesky(C)

# Training Ops
from metalcore import MetalRMSNorm, MetalAdamW, metal_gelu

# RMSNorm (2.5x faster)
norm = MetalRMSNorm(512).to(device)
x = torch.randn(32, 128, 512, device=device)
y = norm(x)

# AdamW (2.9x faster)
model = torch.nn.Linear(512, 256).to(device)
optimizer = MetalAdamW(model.parameters(), lr=1e-3)

# GELU activation
y = metal_gelu(x)

Performance Highlights

Operation Speedup vs PyTorch
RMSNorm (4096x4096) 2.5x
AdamW (16M params) 2.9x
SiLU (256x1024) 4x
QR Batched (500x16x16) 20x

Requirements

  • macOS 12.0+ with Apple Silicon (M1/M2/M3/M4)
  • Python 3.9+
  • PyTorch 2.0+

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metalcore-0.1.5.tar.gz (1.0 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

metalcore-0.1.5-cp313-cp313-macosx_15_0_arm64.whl (553.4 kB view details)

Uploaded CPython 3.13macOS 15.0+ ARM64

metalcore-0.1.5-cp312-cp312-macosx_15_0_arm64.whl (553.4 kB view details)

Uploaded CPython 3.12macOS 15.0+ ARM64

metalcore-0.1.5-cp311-cp311-macosx_15_0_arm64.whl (552.4 kB view details)

Uploaded CPython 3.11macOS 15.0+ ARM64

metalcore-0.1.5-cp310-cp310-macosx_15_0_arm64.whl (551.6 kB view details)

Uploaded CPython 3.10macOS 15.0+ ARM64

metalcore-0.1.5-cp39-cp39-macosx_15_0_arm64.whl (544.8 kB view details)

Uploaded CPython 3.9macOS 15.0+ ARM64

File details

Details for the file metalcore-0.1.5.tar.gz.

File metadata

  • Download URL: metalcore-0.1.5.tar.gz
  • Upload date:
  • Size: 1.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for metalcore-0.1.5.tar.gz
Algorithm Hash digest
SHA256 ff98ab19ad1a2593ce3e1d5ea64d902a57af3a3c1050503bdd3931203875aec8
MD5 4b3294d9faf01347927609e924948ed0
BLAKE2b-256 c1d959fd5573972de5889974099a36d04bf9394de46e7b551ba2f76fedf668a7

See more details on using hashes here.

File details

Details for the file metalcore-0.1.5-cp313-cp313-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for metalcore-0.1.5-cp313-cp313-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 a42beb27e481dfde8605dacaddc1b5771033503c24704ca02f182930708c5e4c
MD5 06222fdf5b4991a0cb9ea6b094a36215
BLAKE2b-256 0c0d3c8622637394ca0d7a6fb1b20593003a7e15063785e7e0b2c23234a2e35c

See more details on using hashes here.

File details

Details for the file metalcore-0.1.5-cp312-cp312-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for metalcore-0.1.5-cp312-cp312-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 ebf393edf33e03a1764174bd2f1acaf1d554f7eb92df3c15138ae232c0be1e6e
MD5 0f18dc4b47bfbe695ab240ab81f32f8a
BLAKE2b-256 0b8dd71f3bb7df30f06cbae7820d819b0efa9646ac5d8f06787de2f157921e1e

See more details on using hashes here.

File details

Details for the file metalcore-0.1.5-cp311-cp311-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for metalcore-0.1.5-cp311-cp311-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 5d0fa4ff376096e69257b0ea1c16bc08dbe986648d33113928fce59fe9452a6b
MD5 f4701de1b270b6a47d0ded565fcfb1e6
BLAKE2b-256 e526f87c08ff9440759a42a34edd95c4082a4ca5bf0b5814b22d1763fda8178c

See more details on using hashes here.

File details

Details for the file metalcore-0.1.5-cp310-cp310-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for metalcore-0.1.5-cp310-cp310-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 be1a5ab8f2fae2c0618e020bd4b5dd2b47857c161d25e72e881be6ec86647039
MD5 715984d69be61b5c4c0f88f7697ff5a2
BLAKE2b-256 c93eaba105e8f71a3ad592c5fcf6f203a68522ce649b48001c97f5aa209a12b2

See more details on using hashes here.

File details

Details for the file metalcore-0.1.5-cp39-cp39-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for metalcore-0.1.5-cp39-cp39-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 0585b71160c70746fa63e50220dd17ffd280f00eab72ac60eadb2285acbc870f
MD5 355e295f0b0ddc11b95abaf75d556b20
BLAKE2b-256 dd13e313be12646818ac73e0711555871dabe17477804b64c330553725907c48

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page