CUDA implementation of Manifold-Constrained Hyper-Connections

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

AndreSlavescu

These details have not been verified by PyPI

Project links

Paper

Project description

mHC.cu

unofficial CUDA implementation of mHC: Manifold-Constrained Hyper-Connections by DeepSeek-AI

Running on Modal

Once the image builds the first time, it will be cached and will not require a rebuild.

Supported GPUs

--gpu h100 # H100 80GB HBM3 SXM5 model
--gpu b200 # B200

Benchmark

Run benchmark suite

# python bench
modal run runmodal.py --gpu h100 --mode bench --scope python

# c++ / cuda bench
modal run runmodal.py --gpu h100 --mode bench --scope native

# run all benches
modal run runmodal.py --gpu h100 --mode bench --scope all

Generate benchmark files (this is automatically run in the above)

# generate the benchmark files
make benchgen

# check status of benchmark files
make benchgen-check

Test

# python tests
modal run runmodal.py --gpu h100 --mode test --scope python

# c++ / cuda tests
modal run runmodal.py --gpu h100 --mode test --scope native

# run all tests
modal run runmodal.py --gpu h100 --mode test --scope all

Training

This trainer approximates the paper’s small-model scaling with a dense Transformer and mHC residual mixing (no MoE/MLA). It uses the fused CUDA path for mHC dynamic H computation when available and streams loss to the Modal logs.

modal run runmodal.py --gpu b200 --mode train --train-args "\
  --preset 3b \
  --scale 0.25 \
  --seq-len 1024 \
  --batch-size 2 \
  --grad-clip 1.0 \
  --grad-accum 4 \
  --max-steps 10 \
  --sdp-kernel flash \
  --logits-chunk-size 512 \
  --recompute-ratio 0.9 \
  --run-name train-3b \
  --log-memory" \
  --download

Download checkpoints and metrics after the run:

modal volume get mhc-runs /train-3b ./runs

Local

Installation

make install      # install PyTorch extension
make install-dev  # install with dev dependencies

Build

make              # build C++ / CUDA source for all architectures
make CUDA_ARCH=90 # build for specific arch (H100)
make clean        # clean build

Test

make test         # C++ / CUDA tests
make test-python  # Python tests

Benchmark

make bench        # run all C++ / CUDA benchmarks
make bench-python # run all Python benchmarks

Pytorch Benchmark Results (benchmarked on H100 SXM5)

Fused mHC vs naive PyTorch mHC implementation (configs from paper Appendix A in section A.1):

Static H Path (shared H across batch):

Batch	Hidden	n	Forward	Backward
320	1280	4	15.20x	10.07x
512	1920	4	10.52x	9.20x
1280	2560	4	5.66x	4.34x
2560	1280	4	5.66x	4.21x

Dynamic H Path (per-batch H values computed via Equations 7-9 from paper):

Batch	Hidden	n	Forward	Backward
320	1280	4	7.39x	3.35x
512	1920	4	7.38x	3.47x
1280	2560	4	5.33x	3.07x
2560	1280	4	5.21x	3.02x

Format

make format       # clang-format + python black formatting

Usage

import torch
from mhc import MHCLayer

# Dynamic H path (default, matches paper architecture)
# H values are computed from x via learned projections
layer = MHCLayer(hidden_dim=4096, expansion_rate=4).cuda()
x = torch.randn(8, 4, 4096, device="cuda")  # [B, n, C]
y = layer(x)  # [B, n, C]

# Static H path (shared H across batch, faster for inference)
layer_static = MHCLayer(hidden_dim=4096, expansion_rate=4, use_dynamic_h=False).cuda()
y = layer_static(x)

Contributing

See CONTRIBUTING.md for directions on how to contribute, including testing, formatting, and code style requirements.

Paper

mHC: Manifold-Constrained Hyper-Connections
https://arxiv.org/abs/2512.24880

DeepSeek-AI

Citation

@article{xie2025mhc,
  title={mHC: Manifold-Constrained Hyper-Connections},
  author={Xie, Zhenda and Wei, Yixuan and Cao, Huanqi and Zhao, Chenggang and Deng, Chengqi and Li, Jiashi and Dai, Damai and Gao, Huazuo and Chang, Jiang and Zhao, Liang and Zhou, Shangyan and Xu, Zhean and Zhang, Zhengyan and Zeng, Wangding and Hu, Shengding and Wang, Yuqing and Yuan, Jingyang and Wang, Lean and Liang, Wenfeng},
  journal={arXiv preprint arXiv:2512.24880},
  year={2025}
}

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

AndreSlavescu

These details have not been verified by PyPI

Project links

Paper

Release history Release notifications | RSS feed

This version

0.1.0

Mar 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mhc_cuda-0.1.0.tar.gz (59.3 kB view details)

Uploaded Mar 9, 2026 Source

File details

Details for the file mhc_cuda-0.1.0.tar.gz.

File metadata

Download URL: mhc_cuda-0.1.0.tar.gz
Upload date: Mar 9, 2026
Size: 59.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mhc_cuda-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`6ad7786700260252a07f99f1577c242d4ec79bb141771f744288fa1be50c304c`
MD5	`eac67f3d10dbea267914676ecc251319`
BLAKE2b-256	`f40d052e58a8f591e945f6614c1060ef4782f1897f826f353ee7ef274b9a1bac`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mhc_cuda-0.1.0.tar.gz:

Publisher: publish.yml on AndreSlavescu/mHC.cu

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mhc_cuda-0.1.0.tar.gz
- Subject digest: 6ad7786700260252a07f99f1577c242d4ec79bb141771f744288fa1be50c304c
- Sigstore transparency entry: 1064069011
- Sigstore integration time: Mar 9, 2026
Source repository:
- Permalink: AndreSlavescu/mHC.cu@a426939c2dbc11c443db041bcff12b65d1b6482a
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/AndreSlavescu
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@a426939c2dbc11c443db041bcff12b65d1b6482a
- Trigger Event: release

mhc-cuda 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

mHC.cu

Running on Modal

Supported GPUs

Benchmark

Test

Training

Local

Installation

Build

Test

Benchmark

Pytorch Benchmark Results (benchmarked on H100 SXM5)

Format

Usage

Contributing

Paper

Citation

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes

Provenance