Skip to main content

High-performance MLX implementation of Manifold-Constrained Hyper-Connections (mHC)

Project description

mhc-mlx

High-performance MLX implementation of Manifold-Constrained Hyper-Connections (mHC) for Apple Silicon.

This library provides a drop-in MHCLayer that fuses multiple operations into optimized Metal kernels, achieving massive speedups over compiled reference layers and standard Python-based implementations.

Original Paper: mHC: Manifold-Constrained Hyper-Connections (DeepSeek-AI)

Installation

Install from PyPI:

pip install mhc-mlx

Quick Start

Option 1: Drop-in Layer (Recommended)

Use MHCLayer for maximum performance.

import mlx.core as mx
from mhc_mlx import MHCLayer

layer = MHCLayer(n=32, C=64) # 32 streams, 64 channels each
x = mx.random.normal((1, 32, 64))
y = layer(x)

Option 2: Universal Wrapper (MHCRewire)

Enhance any existing MLX module (Linear, Conv2d, Transformers) with manifold-constrained stability. Note: optimizing arbitrary modules incurs some overhead compared to the fused MHCLayer.

import mlx.nn as nn
from mhc_mlx import MHCRewire

# Wrap a standard Linear layer
layer = MHCRewire(nn.Linear(512, 512), dims=512, n=16)

Performance

We benchmarked on an Apple M4 Pro (macOS 15.6). mhc-mlx outperforms standard implementations across all scales.

Head-to-Head: mhc-mlx vs mlx-mhc (Competitor)

Scenario mhc-mlx (ours) mlx-mhc (them) Speedup
Latency ($B=1, C=512$) 392 us 1120 us 2.86x
Throughput ($B=32, C=512$) 105 us 866 us 8.25x

Why We're Faster

Implementation Characteristics Performance Impact
Python / JIT Many small kernel launches Higher overhead, low occupancy
Fused Metal 1-3 highly optimized kernels Minimal overhead, maximum bandwidth

Latency Floor ($B=1$, Sequence Length=32)

Channels (C) Kernel Strategy Layer Speedup (vs Compiled MLX)
256 Fully Fused 2.27x
1024 Fully Fused 1.57x
2048 Fully Fused 1.58x
4096 Column Parallel 1.41x
8192 Column Parallel 2.18x

Key Optimizations

  • "Zero-Cost" Weight Folding: MHCRewire folds scaling directly into nn.Linear weights, eliminating pre-scaling overhead.
  • Quantized Layer Support: Seamlessly wraps nn.QuantizedLinear (4-bit/8-bit) for efficient local LLM inference.
  • Fully Fused Kernel: Single kernel for Aggregate + RMS + Mix + Add.
  • Column-Parallel Mixing: Vectorized kernel maximizing throughput for larger workloads.
  • Adaptive Dispatch: Runtime heuristic selects the fastest kernel strategy.
  • Super-Fused Backward: Fused gradients for maximum training efficiency.

Troubleshooting

Run diagnostics to check your environment:

mhc-mlx-info

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mhc_mlx-0.4.3.tar.gz (39.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mhc_mlx-0.4.3-py3-none-any.whl (58.4 kB view details)

Uploaded Python 3

File details

Details for the file mhc_mlx-0.4.3.tar.gz.

File metadata

  • Download URL: mhc_mlx-0.4.3.tar.gz
  • Upload date:
  • Size: 39.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mhc_mlx-0.4.3.tar.gz
Algorithm Hash digest
SHA256 3a348c4a83d155232cee8f45d2aa3983d40218d50c3d26c9de98dd5e3cf9b2fb
MD5 4e50af0d3b89a119f1477e5bf47efce4
BLAKE2b-256 719cfcf8c7b905559578cf434a8b582bb441b5266f657e9d85b21ba71e9a6f03

See more details on using hashes here.

Provenance

The following attestation bundles were made for mhc_mlx-0.4.3.tar.gz:

Publisher: publish.yml on svdrecbd/mhc-mlx

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mhc_mlx-0.4.3-py3-none-any.whl.

File metadata

  • Download URL: mhc_mlx-0.4.3-py3-none-any.whl
  • Upload date:
  • Size: 58.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mhc_mlx-0.4.3-py3-none-any.whl
Algorithm Hash digest
SHA256 c0156ed936bfbd091c9e33b651cf1a85c28d0d54f7c1cf6d859a5bf38be9883d
MD5 3faad94058441d8438f24c8c56a6d929
BLAKE2b-256 5e27775e90015b864c133f7bf421a3c05c83c56ae928ccdf44d7d181c882d353

See more details on using hashes here.

Provenance

The following attestation bundles were made for mhc_mlx-0.4.3-py3-none-any.whl:

Publisher: publish.yml on svdrecbd/mhc-mlx

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page