CUda Matrix Multiply library
Project description
cumm
CUda Matrix Multiply library.
cumm
is developed during learning of CUTLASS, which use too much c++ template and make code unmaintainable. So I develop pccm, use python as meta programming language, to replace c++ template meta programming.
Now pccm
become a foundational framework of cumm
and my other c++ project such as spconv.
cumm
also contains a python asyncio-based gemm simulator that share same meta program with CUDA code, enable gemm visualization and easy debug experience.
Install
Prebuilt
We offer python 3.7-3.10 and cuda 10.2/11.1/11.3/11.4 prebuilt binaries for linux (manylinux).
We offer python 3.7-3.10 and cuda 10.2/11.1/11.3/11.4 prebuilt binaries for windows 10/11.
We will offer prebuilts for CUDA versions supported by latest pytorch release. For example, pytorch 1.9 support cuda 10.2 and 11.1, so we support them too.
pip install cumm-cu102
for CUDA 10.2
pip install cumm-cu111
for CUDA 11.1
pip install cumm-cu113
for CUDA 11.3
pip install cumm-cu114
for CUDA 11.4
Build from source
Linux
- install build-essential, install CUDA
- run
export CUMM_DISABLE_JIT="1"
- run
python setup.py install
/pip install -e .
/python setup.py bdist_wheel
+pip install dists/xxx.whl
Windows 10/11
- install visual studio 2019 or newer. make sure C++ development package is installed. install CUDA
- set powershell script execution policy
- start a new powershell, run
tools/msvc_setup.ps1
- run
$Env:CUMM_DISABLE_JIT = "1"
- run
python setup.py install
/pip install -e .
/python setup.py bdist_wheel
+pip install dists/xxx.whl
Note
The work is done when the author is an employee at Tusimple.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for cumm-0.2.1-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cc49f9469dc4ddeb04bb884a79b88eabdb218c3b92ffc29d1bc159405be3739a |
|
MD5 | 253f88b0ece000f21971ea056b4a0488 |
|
BLAKE2b-256 | 839c8f90fef27e8b9f9b8dc28632f906fa85eb7860cae300365e03553487cea8 |
Hashes for cumm-0.2.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 863d2aadaf438f6fbd043f00fd48c49cedbfbd12f1074cc91c2c730e0fcbc5a1 |
|
MD5 | de164311355d5e8bc4f149a627e6f2a8 |
|
BLAKE2b-256 | 3cfb7fdbf43a5d6e15bc13542fbbebdffab1a8dd44f4c347e03d0c40aa8b85cb |
Hashes for cumm-0.2.1-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8bf8b98a714141c8d5437d2dbd87614942e9fef593a0218d9b83a0ebebc4429d |
|
MD5 | 53fa37be11452c0069ea72317c055db0 |
|
BLAKE2b-256 | 54d7b5e3daf1e1d643cdc5b72a1a839e8032342722c873956c7a05375ca1cce6 |
Hashes for cumm-0.2.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fa72f365a9d4ceee9196eba335be5b2e94757db3cf60d4093587f16bed5103ba |
|
MD5 | 7f81fba71efeba5b5aafa2821bdaabd0 |
|
BLAKE2b-256 | 8201d2724abb2779fc4dfa17b5daa01537dcdb4839a2fa063d62d66c332bea46 |
Hashes for cumm-0.2.1-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 67268d4423227b2cbdb139c827a9b8d08b7d21b8d31342fc744492fae27cad3b |
|
MD5 | 133406c4ee874acf2c8241ae183a6f2f |
|
BLAKE2b-256 | cc674fa86e9d8d936403ef487e2ba78dfc357c817d096a6bc93ee0f8012c78cd |
Hashes for cumm-0.2.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cbb711fc0f53e26ed68bb6ffa9ec355abe792aaef61b82a93360db0646f6c399 |
|
MD5 | aa6a886304aa1803bf4a7aa68d5a507a |
|
BLAKE2b-256 | fca7da0c73c921e4a8e2f2482b662981e940403e20243738cc5e76b1d403a9fa |
Hashes for cumm-0.2.1-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c98cd97d713eeb644bab25005d64ba7b82fea510e805bc134bdf2608d42acc2c |
|
MD5 | 247975bdc8bc65bfff44ef3193cd3de7 |
|
BLAKE2b-256 | fe63cf3e866419a98a824816c7b697672872b5b5098ca69d32b6c932ded7dd7d |
Hashes for cumm-0.2.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2c0766e00ac2739df7199ffe8ff30c56b2f77aefc8f78bc40d4e8e10cffe4379 |
|
MD5 | 2c4bc84e8af9ecdf9ae8ec06a89a5071 |
|
BLAKE2b-256 | 96c35ab9a8227e1fe7d2c1d4318388b00c55d60d9fbf7c39bc9d931c7c2a5930 |