CUda Matrix Multiply library
Project description
cumm
CUda Matrix Multiply library.
cumm
is developed during learning of CUTLASS, which use too much c++ template and make code unmaintainable. So I develop pccm, use python as meta programming language, to replace c++ template meta programming.
Now pccm
become a foundational framework of cumm
and my other c++ project such as spconv.
cumm
also contains a python asyncio-based gemm simulator that share same meta program with CUDA code, enable gemm visualization and easy debug experience.
Install
Prebuilt
We offer python 3.7-3.10 and cuda 10.2/11.1/11.3/11.4 prebuilt binaries for linux (manylinux).
We offer python 3.7-3.10 and cuda 10.2/11.1/11.3/11.4 prebuilt binaries for windows 10/11.
We will offer prebuilts for CUDA versions supported by latest pytorch release. For example, pytorch 1.9 support cuda 10.2 and 11.1, so we support them too.
pip install cumm-cu102
for CUDA 10.2
pip install cumm-cu111
for CUDA 11.1
pip install cumm-cu113
for CUDA 11.3
pip install cumm-cu114
for CUDA 11.4
Build from source
Linux
- install build-essential, install CUDA
- run
export CUMM_DISABLE_JIT="1"
- run
python setup.py install
/pip install -e .
/python setup.py bdist_wheel
+pip install dists/xxx.whl
Windows 10/11
- install visual studio 2019 or newer. make sure C++ development package is installed. install CUDA
- set powershell script execution policy
- start a new powershell, run
tools/msvc_setup.ps1
- run
$Env:CUMM_DISABLE_JIT = "1"
- run
python setup.py install
/pip install -e .
/python setup.py bdist_wheel
+pip install dists/xxx.whl
Note
The work is done when the author is an employee at Tusimple.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for cumm_cu102-0.2.0-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bbda24bfc3468c9875e8e0b359d9b4742458c700076a0c0befc32d06d27962df |
|
MD5 | 1ace61413e09060e534d2c2044b58a7c |
|
BLAKE2b-256 | c26d0cdc53508f615f1b41f401be9ac1a61414a2c0fb01071321f9091070fa5a |
Hashes for cumm_cu102-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7e4aeca3ec42f15c784c7f0c92368a5af936b47cbd9991eb6b3f2e31bc40cc73 |
|
MD5 | c6435acdd1067586133b149d0f951688 |
|
BLAKE2b-256 | 6226ba40ad3a5bdc4e07d7b771dc4f567503405208fb463d07ce4adb3369fa6a |
Hashes for cumm_cu102-0.2.0-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 306e1d695e0685d1f843e1a68e9345f5c764f5246db906ed599b336f63f6f80c |
|
MD5 | 142d596d0c761f2c8c456a6d96813390 |
|
BLAKE2b-256 | 75243a7deb9bc96c5c5ddf25eaa26225e885393cc3ec6d55474eefdbca0b04a4 |
Hashes for cumm_cu102-0.2.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8f97d6cecec8a28a30dda1e9e35a088ba04bce8dab054d1f98975ee7af3f6685 |
|
MD5 | 61a355539000268545b041e10e03bcc8 |
|
BLAKE2b-256 | 6467ed667d34348cb7bb83498367c85156e24c2535d4b40ee40e5316c031fca1 |
Hashes for cumm_cu102-0.2.0-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ad3eb9228c8d9b3037d63d11a7738fcee507559ee598f9a7cbd4efd6aeacc13b |
|
MD5 | 1338460ab2a68231b146dd3b4d8e24a9 |
|
BLAKE2b-256 | b1cc6f80fafd0dfdc18bfdf695e09861b57d1b1ff4bad74bff8815a2301e1bcb |
Hashes for cumm_cu102-0.2.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 750f1444bbf01aae6e24ccf0ef60eb3f70de198f5b4def90b77149f14337e8de |
|
MD5 | d1ce8b0e9f2439fb1502b08af03455a4 |
|
BLAKE2b-256 | fe28cd75f26ea435ebc6e104275e6f17ea6cd13ec14dacbe6d65ff7aeeec2630 |
Hashes for cumm_cu102-0.2.0-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | beaea5ebab6073d31231e90806b346acede2c4623ef36200b97e92b43a884037 |
|
MD5 | 3db7b500be7d5c1a894e7d7eafec62ca |
|
BLAKE2b-256 | 9d29e3a8a933c884115b3b14d4eb4ae9b1af9ed5636249cc4729d5e5cb3967fe |
Hashes for cumm_cu102-0.2.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4436c375615ad00571c8381b81dc74090fe18e85dccc9c568051ac12a0763547 |
|
MD5 | a5f0988ce5d72a55cef5ec0e9b72cf11 |
|
BLAKE2b-256 | 56a287a4208a4e5a792dc4d6903b58ea389d92c920da6890f99f906157a39883 |