High-performance MLX implementation of Manifold-Constrained Hyper-Connections (mHC)
Project description
mhc-mlx
High-performance MLX implementation of Manifold-Constrained Hyper-Connections (mHC) for Apple Silicon.
This library provides optimized Metal kernels for mHC, achieving massive speedups over compiled reference layers and standard Python-based implementations.
Original Paper: mHC: Manifold-Constrained Hyper-Connections (DeepSeek-AI)
Installation
Install from PyPI:
pip install mhc-mlx
Quick Start
Option 1: Drop-in Layer
Use MHCLayer for maximum performance.
import mlx.core as mx
from mhc_mlx import MHCLayer
layer = MHCLayer(n=32, C=64) # 32 streams, 64 channels each
x = mx.random.normal((1, 32, 64))
y = layer(x)
Option 2: Universal Wrapper (MHCRewire)
Enhance any existing MLX module with manifold-constrained stability.
import mlx.nn as nn
from mhc_mlx import MHCRewire
# Zero-Cost Folding: automatically optimizes Linear weights
layer = MHCRewire(nn.Linear(512, 512), dims=512, n=16)
x = mx.random.normal((1, 512))
y = layer(x)
Performance
We benchmarked on an Apple M4 Pro (macOS 15.6). mhc-mlx outperforms standard implementations across all scales.
Head-to-Head: mhc-mlx vs mlx-mhc (Competitor)
| Scenario | mhc-mlx (ours) | mlx-mhc (them) | Speedup |
|---|---|---|---|
| Latency ($B=1, C=2048$) | 552.67 us | 975.08 us | 1.76x |
| Throughput ($B=1, C=2048$) | 148.05 us | 802.47 us | 5.42x |
| Latency ($B=32, C=2048$) | 581.67 us | 1310.63 us | 2.25x |
| Throughput ($B=32, C=2048$) | 243.65 us | 1122.41 us | 4.61x |
Why We're Faster
| Implementation | Characteristics | Performance Impact |
|---|---|---|
| Python / JIT | Many small kernel launches | Higher overhead, low occupancy |
| Fused Metal | 1-3 highly optimized kernels | Minimal overhead, maximum bandwidth |
Latency Floor ($B=1$, Sequence Length=32)
| Channels (C) | Kernel Strategy | Layer Speedup (vs Compiled MLX) |
|---|---|---|
| 256 | Fully Fused | 2.27x |
| 1024 | Fully Fused | 1.57x |
| 2048 | Fully Fused | 1.58x |
| 4096 | Column Parallel | 1.41x |
| 8192 | Column Parallel | 2.18x |
Key Optimizations
- "Zero-Cost" Weight Folding:
MHCRewirefolds scaling directly intonn.Linearweights, eliminating pre-scaling overhead. - Fully Fused Kernel: Single-pass kernel for Aggregate + RMS + Mix + Add.
- Column-Parallel Mixing: Vectorized kernel maximizing throughput for larger workloads.
- Adaptive Dispatch: Runtime heuristic selects the fastest kernel strategy.
Advanced Usage
Custom Blocks: Fused Residual Add + Aggregate
If you are building custom Transformer blocks, you can use residual_add_agg to fuse the residual connection with the mHC aggregation step. This saves a full memory read/write round-trip (~1.4x speedup).
from mhc_mlx import residual_add_agg
# Standard: x = x + res; y_agg = aggregate(x)
# Fused:
x, y_agg = residual_add_agg(x, res, H_pre)
Troubleshooting
Run diagnostics to check your environment:
mhc-mlx-info
Development & Publishing
Workflow Name: For PyPI Trusted Publishing, use publish.yml.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mhc_mlx-0.4.0.tar.gz.
File metadata
- Download URL: mhc_mlx-0.4.0.tar.gz
- Upload date:
- Size: 39.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c29230e7fb2fc896e319eda7da7916f4216d4ff0d65ba117f1345652d66c85d5
|
|
| MD5 |
813a1fc1fe1cf8e893a04929d9479a66
|
|
| BLAKE2b-256 |
b92f6e9570f59d8eb779fca19d72a8b2bf77aead656211b9e92b048ce990307e
|
Provenance
The following attestation bundles were made for mhc_mlx-0.4.0.tar.gz:
Publisher:
publish.yml on svdrecbd/mhc-mlx
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mhc_mlx-0.4.0.tar.gz -
Subject digest:
c29230e7fb2fc896e319eda7da7916f4216d4ff0d65ba117f1345652d66c85d5 - Sigstore transparency entry: 813573641
- Sigstore integration time:
-
Permalink:
svdrecbd/mhc-mlx@0810af3aec7d692a2dc4c3e3c321be94c4ca3c93 -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/svdrecbd
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@0810af3aec7d692a2dc4c3e3c321be94c4ca3c93 -
Trigger Event:
push
-
Statement type:
File details
Details for the file mhc_mlx-0.4.0-py3-none-any.whl.
File metadata
- Download URL: mhc_mlx-0.4.0-py3-none-any.whl
- Upload date:
- Size: 58.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3f46aa771bb0b0e54470dd233ef6c3652b2bf5d14e0a91d69a6b21bf64980c67
|
|
| MD5 |
662ec31a5a78bc44974cc82f86710ec0
|
|
| BLAKE2b-256 |
2832c5d6a21b48f255bdb164a8ba21404bab667c96b63dd3da45209390a52e23
|
Provenance
The following attestation bundles were made for mhc_mlx-0.4.0-py3-none-any.whl:
Publisher:
publish.yml on svdrecbd/mhc-mlx
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mhc_mlx-0.4.0-py3-none-any.whl -
Subject digest:
3f46aa771bb0b0e54470dd233ef6c3652b2bf5d14e0a91d69a6b21bf64980c67 - Sigstore transparency entry: 813573643
- Sigstore integration time:
-
Permalink:
svdrecbd/mhc-mlx@0810af3aec7d692a2dc4c3e3c321be94c4ca3c93 -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/svdrecbd
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@0810af3aec7d692a2dc4c3e3c321be94c4ca3c93 -
Trigger Event:
push
-
Statement type: