CUda Matrix Multiply library
Project description
cumm
CUda Matrix Multiply library.
cumm
is developed during learning of CUTLASS, which use too much c++ template and make code unmaintainable. So I develop pccm, use python as meta programming language, to replace c++ template meta programming.
Now pccm
become a foundational framework of cumm
and my other c++ project such as spconv.
cumm
also contains a python asyncio-based gemm simulator that share same meta program with CUDA code, enable gemm visualization and easy debug experience.
Install
Prebuilt
We offer python 3.7-3.10 and cuda 10.2/11.1/11.3/11.4 prebuilt binaries for linux (manylinux).
We offer python 3.7-3.10 and cuda 10.2/11.1/11.3/11.4 prebuilt binaries for windows 10/11.
We will offer prebuilts for CUDA versions supported by latest pytorch release. For example, pytorch 1.9 support cuda 10.2 and 11.1, so we support them too.
pip install cumm-cu102
for CUDA 10.2
pip install cumm-cu111
for CUDA 11.1
pip install cumm-cu113
for CUDA 11.3
pip install cumm-cu114
for CUDA 11.4
Build from source
Linux
- install build-essential, install CUDA
- run
export CUMM_DISABLE_JIT="1"
- run
python setup.py install
/pip install -e .
/python setup.py bdist_wheel
+pip install dists/xxx.whl
Windows 10/11
- install visual studio 2019 or newer. make sure C++ development package is installed. install CUDA
- set powershell script execution policy
- start a new powershell, run
tools/msvc_setup.ps1
- run
$Env:CUMM_DISABLE_JIT = "1"
- run
python setup.py install
/pip install -e .
/python setup.py bdist_wheel
+pip install dists/xxx.whl
Note
The work is done when the author is an employee at Tusimple.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for cumm_cu111-0.2.0-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7ef500504338745f4897cfec26a5dae039f48c63924ccf5e763d73fa74b119f6 |
|
MD5 | 0685e7be3d58ebd474fdf13df6d13768 |
|
BLAKE2b-256 | c3ed3ecee88bfab4871cc30b99e18b9626bd233b9ee85ffa07c2a193d00f8873 |
Hashes for cumm_cu111-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c18ce07339af58b14987c74171980065f17bb589480fc92717040fbbf5f4b8fc |
|
MD5 | 36050a6d07e84041015e495acf33b63c |
|
BLAKE2b-256 | c4ab1a5a4a068341d192d8be9d17900065967eeb5e33c90b2bda76f9849307ea |
Hashes for cumm_cu111-0.2.0-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 340bbe83aac24c4d8f4a04080c245a09e34a0af244b48eded33b62e7bdce41a2 |
|
MD5 | 8a41e05dcf1e5e921423506c68c2a738 |
|
BLAKE2b-256 | 530d3f7c7b080dec4fe742e63bc3f9dab95c3009ff99d57b23fdb9d33af825fb |
Hashes for cumm_cu111-0.2.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b342c4dc248a8701a5999e3e00c3b0f6ecda20812c4fb5cfd1eaa8c9e61741cf |
|
MD5 | 875ffbd3afb6e304251591868f6ee823 |
|
BLAKE2b-256 | 56be9b59fa8a1b4b9119f88669f7a433b7bb4018fd8e50689dac70034f91501f |
Hashes for cumm_cu111-0.2.0-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8957b8ff599d4d4e926ad58bba14a635dea174c82b12f8056b0124d24e1af575 |
|
MD5 | d2346b658beae462ad3dbe40916d244e |
|
BLAKE2b-256 | 17043c28755833b876a7f75898ef7fbc3e4e60e938784c824eb19522d299c0da |
Hashes for cumm_cu111-0.2.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8d7fe9d776d4385b122ac540620cb48ad045dc8591289a10fa2e2c49f9ed6da0 |
|
MD5 | 21dd0737f9a238f7e3099566e4327bdd |
|
BLAKE2b-256 | 5ca54049b0d940893e52580bbc4d12ef2e024d685f9e569439a9d1c751d4dc24 |
Hashes for cumm_cu111-0.2.0-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 80db79d5f539463aa764bd522bee9f6cff415a1c247b1966db7333c2c5e62653 |
|
MD5 | 29dd1e2456d1b51e70f23429bf58403d |
|
BLAKE2b-256 | e2287218bff44e70dc338a8c766832a1154b4782f1c441c24b429d289d03a39a |
Hashes for cumm_cu111-0.2.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6a463ddabc1a6e543fb658a4f787d9fb0cdc97bd799e9c3c2d88aa2232898a76 |
|
MD5 | 345ab19ad65740f85a63694336fd7ff3 |
|
BLAKE2b-256 | 98904a46dcba4d75cdba0def26638231ef49445a807520cad78251efb2670dc3 |