High-performance MLX implementation of Manifold-Constrained Hyper-Connections (mHC)
Project description
mhc-mlx
High-performance MLX implementation of Manifold-Constrained Hyper-Connections (mHC) for Apple Silicon.
This library provides a drop-in MHCLayer that fuses multiple operations into optimized Metal kernels, achieving massive speedups over compiled reference layers.
Original Paper: mHC: Manifold-Constrained Hyper-Connections (DeepSeek-AI)
Installation
Install from PyPI:
pip install mhc-mlx
Quick Start
import mlx.core as mx
from mhc_mlx import MHCLayer
# Create some dummy data
B, n, C = 1, 32, 2048
x = mx.random.normal((B, n, C)).astype(mx.bfloat16)
# Initialize layer (uses Metal kernels by default)
layer = MHCLayer(n=n, C=C)
# Forward pass
y = layer(x)
mx.eval(y)
print(y.shape) # (1, 32, 2048)
Note: You can also import as mlx_mhc if you prefer the style of other community packages:
from mlx_mhc import MHCLayer
Performance
We benchmarked on an Apple M4 Pro (macOS 15.6). mhc-mlx automatically selects the best kernel strategy based on workload size.
Head-to-Head: mhc-mlx vs mlx-mhc
| Scenario | mhc-mlx (ours) | mlx-mhc (other) | Speedup |
|---|---|---|---|
| Latency ($B=1, C=512$) | 456.67 us | 966.17 us | 2.12x |
| Throughput ($B=1, C=512$) | 85.56 us | 804.49 us | 9.40x |
| Latency ($B=32, C=2048$) | 575.46 us | 1278.92 us | 2.22x |
| Throughput ($B=32, C=2048$) | 249.43 us | 1104.45 us | 4.43x |
Why We're Faster
| Implementation | Characteristics | Performance Impact |
|---|---|---|
| Python / JIT | Many small kernel launches | Higher overhead, low occupancy |
| Fused Metal | 1-3 highly optimized kernels | Minimal overhead, maximum bandwidth |
Latency Floor ($B=1$, Sequence Length=32)
Optimized for ultra-low latency response times.
| Channels (C) | Kernel Strategy | Layer Speedup |
|---|---|---|
| 256 | Fully Fused | 2.27x |
| 1024 | Fully Fused | 1.57x |
| 2048 | Fully Fused | 1.58x |
| 4096 | Column Parallel | 1.41x |
| 8192 | Column Parallel | 2.18x |
High Throughput ($B=32$, Sequence Length=32)
Maximum speedups for heavy data processing.
| Operation | Scale (n, C) | Peak Speedup |
|---|---|---|
| Sinkhorn-Knopp | n=4 | 26.99x |
| Mix + Add (Fused) | n=32, C=2048 | 14.92x |
| Full MHCLayer | n=4, C=4096 | 17.33x |
(Benchmarks run with bfloat16. Reproduction: PYTHONPATH=. python compare_mhc.py)
Key Optimizations
- Fully Fused Kernel: Single kernel for Aggregate + RMS + Mix + Add. Ideal for $B \times C \le 2048$.
- Column-Parallel Mixing: Vectorized kernel maximizing throughput for larger workloads.
- Adaptive Dispatch: Runtime heuristic selects the fastest kernel.
- Super-Fused Backward: Fused gradients for maximum training efficiency.
Troubleshooting
Kernel Compilation Errors: If you see Metal build errors, ensure you are on macOS with Apple Silicon. Run diagnostics to check your environment:
mhc-mlx-info
Development & Publishing
Workflow Name: For PyPI Trusted Publishing, the workflow filename is publish.yml.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mhc_mlx-0.1.2.tar.gz.
File metadata
- Download URL: mhc_mlx-0.1.2.tar.gz
- Upload date:
- Size: 36.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b1437775c093d5d79cc751067473060bd25020725ea0ba9dcdc568a2f8e61252
|
|
| MD5 |
74d159f658706aec1f3d022cd6c29bd5
|
|
| BLAKE2b-256 |
4d68374583513a4f3026228c97545e9b4a028365e37abff653aaee55dec9a5d2
|
Provenance
The following attestation bundles were made for mhc_mlx-0.1.2.tar.gz:
Publisher:
publish.yml on svdrecbd/mhc-mlx
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mhc_mlx-0.1.2.tar.gz -
Subject digest:
b1437775c093d5d79cc751067473060bd25020725ea0ba9dcdc568a2f8e61252 - Sigstore transparency entry: 813467277
- Sigstore integration time:
-
Permalink:
svdrecbd/mhc-mlx@5a9912a8c5b0c262bb88f076b75955e342ae773b -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/svdrecbd
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5a9912a8c5b0c262bb88f076b75955e342ae773b -
Trigger Event:
push
-
Statement type:
File details
Details for the file mhc_mlx-0.1.2-py3-none-any.whl.
File metadata
- Download URL: mhc_mlx-0.1.2-py3-none-any.whl
- Upload date:
- Size: 55.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
53313eea623c196409ac3d71cd8aa3151426e4606061fbfd8986a1d67b7850aa
|
|
| MD5 |
3cf7aef01e8ed3ae5b0dbe11c9cc1f18
|
|
| BLAKE2b-256 |
7c7e9ac8893e83fae6c5fb93e75e1a6a4eb23193e9725cb380a3aecdf9be433e
|
Provenance
The following attestation bundles were made for mhc_mlx-0.1.2-py3-none-any.whl:
Publisher:
publish.yml on svdrecbd/mhc-mlx
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mhc_mlx-0.1.2-py3-none-any.whl -
Subject digest:
53313eea623c196409ac3d71cd8aa3151426e4606061fbfd8986a1d67b7850aa - Sigstore transparency entry: 813467279
- Sigstore integration time:
-
Permalink:
svdrecbd/mhc-mlx@5a9912a8c5b0c262bb88f076b75955e342ae773b -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/svdrecbd
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5a9912a8c5b0c262bb88f076b75955e342ae773b -
Trigger Event:
push
-
Statement type: