CUda Matrix Multiply library
Project description
cumm
CUda Matrix Multiply library.
cumm
is developed during learning of CUTLASS, which use too much c++ template and make code unmaintainable. So I develop pccm, use python as meta programming language, to replace c++ template meta programming.
Now pccm
become a foundational framework of cumm
and my other c++ project such as spconv.
cumm
also contains a python asyncio-based gemm simulator that share same meta program with CUDA code, enable gemm visualization and easy debug experience.
Install
Prebuilt
We offer python 3.7-3.10 and cuda 10.2/11.1/11.3/11.4 prebuilt binaries for linux (manylinux).
We offer python 3.7-3.10 and cuda 10.2/11.1/11.3/11.4 prebuilt binaries for windows 10/11.
We will offer prebuilts for CUDA versions supported by latest pytorch release. For example, pytorch 1.9 support cuda 10.2 and 11.1, so we support them too.
pip install cumm-cu102
for CUDA 10.2
pip install cumm-cu111
for CUDA 11.1
pip install cumm-cu113
for CUDA 11.3
pip install cumm-cu114
for CUDA 11.4
Build from source
Linux
- install build-essential, install CUDA
- run
export CUMM_DISABLE_JIT="1"
- run
python setup.py install
/pip install -e .
/python setup.py bdist_wheel
+pip install dists/xxx.whl
Windows 10/11
- install visual studio 2019 or newer. make sure C++ development package is installed. install CUDA
- set powershell script execution policy
- start a new powershell, run
tools/msvc_setup.ps1
- run
$Env:CUMM_DISABLE_JIT = "1"
- run
python setup.py install
/pip install -e .
/python setup.py bdist_wheel
+pip install dists/xxx.whl
Note
The work is done when the author is an employee at Tusimple.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for cumm_cu114-0.2.0-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b5a18d07b4d3c7c8a1904a56e550c2b4841fef5ae6f199f0f8780a37c1f815d8 |
|
MD5 | f23773b2f021bd3168fc266b29f97b18 |
|
BLAKE2b-256 | 00f3e2ea90d3ffc085ba8ce92e170a352284116bfef04fed220eace469a3ee50 |
Hashes for cumm_cu114-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9852a704d09129d83bd07770a75b90789a9fca33fc9aa4453bdcc2265fe90934 |
|
MD5 | 65a91fa05d2a7cf392f787ccf977b306 |
|
BLAKE2b-256 | 9b526252f0cdc38c3b71ec58b1d6f63de3d07b205faa141d29c1944b071a176d |
Hashes for cumm_cu114-0.2.0-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bd0a3aba74e04a98cad1703a379aea248412f5655ad573b7f2d24e2846317d86 |
|
MD5 | 9cf04ab2ffbd0b045f0710e797f040a9 |
|
BLAKE2b-256 | 9d192653638cef3c2ea31d94054c61340da5e158d554398dd456654863ff4e02 |
Hashes for cumm_cu114-0.2.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0d2079693749fb0964f5a4280357aeb17738bd5b4e90f8c88608669040efcf04 |
|
MD5 | 2b28e76833861995dcc3a9189ad5e002 |
|
BLAKE2b-256 | 94de4133656d06cd260865ca7c017a6f9a2a25d13982a50e90a70c2fff444e09 |
Hashes for cumm_cu114-0.2.0-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1f8182103860d1b1bfe5e243d7b81bfab80f96ddc9f513213b6a989c169edd8f |
|
MD5 | 8d2123ad6e9d9299deb830104b5e0fd9 |
|
BLAKE2b-256 | 56b4e8dce0d8dd5e1702212f793d1bd53056e0c614ec5b0c7f293e32ef052113 |
Hashes for cumm_cu114-0.2.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c4e0b7f058a297fc54692632c9f36c71bdefaf33885c352a27a5bba2857ba957 |
|
MD5 | 134b5f1c0c05f5bbf09054df80386a87 |
|
BLAKE2b-256 | ff4e41c91bd4fa4a81cada722606a5fbf6c52cdd2a0c88980a4fe2b2a9ed9a50 |
Hashes for cumm_cu114-0.2.0-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7d5ee4f019799fd933b642bf6773665304079beb1c9dc9976b4bb169714c9f87 |
|
MD5 | 226a0d8f3d181bab67471879e2f65c78 |
|
BLAKE2b-256 | 958e8ecfdde969ff54805d852151ec244a5d4dabcdf97a2d29292a817af63fb9 |
Hashes for cumm_cu114-0.2.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e9a47381e40df00426ef2c6bfc1c496ea13b97addeb83a84e91aa9a35039e188 |
|
MD5 | 861d7136368bcb3a03e2091132f74a7a |
|
BLAKE2b-256 | cc0f45f2321087700d6f9af1aae9e20ac33792f7fe82122b0b12de9ab257d92f |