CUda Matrix Multiply library
Project description
cumm
CUda Matrix Multiply library.
cumm
is developed during learning of CUTLASS, which use too much c++ template and make code unmaintainable. So I develop pccm, use python as meta programming language, to replace c++ template meta programming.
Now pccm
become a foundational framework of cumm
and my other c++ project such as spconv.
cumm
also contains a python asyncio-based gemm simulator that share same meta program with CUDA code, enable gemm visualization and easy debug experience.
Install
Prebuilt
We offer python 3.7-3.10 and cuda 10.2/11.1/11.3/11.4 prebuilt binaries for linux (manylinux).
We offer python 3.7-3.10 and cuda 10.2/11.1/11.3/11.4 prebuilt binaries for windows 10/11.
We will offer prebuilts for CUDA versions supported by latest pytorch release. For example, pytorch 1.9 support cuda 10.2 and 11.1, so we support them too.
pip install cumm-cu102
for CUDA 10.2
pip install cumm-cu111
for CUDA 11.1
pip install cumm-cu113
for CUDA 11.3
pip install cumm-cu114
for CUDA 11.4
Build from source
Linux
- install build-essential, install CUDA
- run
export CUMM_DISABLE_JIT="1"
- run
python setup.py install
/pip install -e .
/python setup.py bdist_wheel
+pip install dists/xxx.whl
Windows 10/11
- install visual studio 2019 or newer. make sure C++ development package is installed. install CUDA
- set powershell script execution policy
- start a new powershell, run
tools/msvc_setup.ps1
- run
$Env:CUMM_DISABLE_JIT = "1"
- run
python setup.py install
/pip install -e .
/python setup.py bdist_wheel
+pip install dists/xxx.whl
Note
The work is done when the author is an employee at Tusimple.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for cumm_cu102-0.2.1-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8cbbf721c6dbe7b2baee99874a0c7aa112813a477040823d54de425fc19b22d9 |
|
MD5 | cd88cb6e136930418608957062bb8952 |
|
BLAKE2b-256 | b2a278bac04ff245189ce35ab61d32f1621e5cb68f8cc03db60a08dcac5c952b |
Hashes for cumm_cu102-0.2.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ae156ee35ada67be3770196c77bbf8ad8215e38ed338ddb4fd451df9a6b2fa1a |
|
MD5 | de1cd779ae97e8b33f23d09ccbb7e0f5 |
|
BLAKE2b-256 | 8c59963936e6f3a67ab54a443cb4911438a3e610cde5f6a0da71a1bbda0ff9fa |
Hashes for cumm_cu102-0.2.1-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 43d67dda485193e9e71c62a04ebf4b24d2994482113ba08d14a2b27a768b80d6 |
|
MD5 | fcca41aba9424a531c1153c6fb5dba5b |
|
BLAKE2b-256 | c64ef912ed4c23dc7402375131c41808d9a87f48b4cf54c00105b099fc227ebc |
Hashes for cumm_cu102-0.2.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c90c460b40551001ed3a35089938db9f731f15fda352bbc2a3cb2a4409b3a388 |
|
MD5 | 16d4ea005f8b1fed4236a5260593351e |
|
BLAKE2b-256 | 623e3e0f09ced5dd8e8fa977f3eb7bb048dc0931fc372d12a83675a9d7369ff7 |
Hashes for cumm_cu102-0.2.1-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 23a78f7a7dd1f8079792bd4db8801a5a9560ad47f24e0483488f402624624a7a |
|
MD5 | 07c5a7dbf9b97502b07a7f158c273e18 |
|
BLAKE2b-256 | ae6534bf70686d47b4f82dacc8f4f9f4ed16a530da17b2d841402456a9da0327 |
Hashes for cumm_cu102-0.2.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1eef8ce280bb247577e663d85be315a073283e89455076612eda25085b012121 |
|
MD5 | de6383c544fb7bb2671b18eb3b9811a3 |
|
BLAKE2b-256 | 58760104e3e0e92dbbbb5aa119b637522934ad986159142386ddba88728d1c28 |
Hashes for cumm_cu102-0.2.1-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c13abf8031b95167f889826225468a9aaa9bdcd68cc08ef586e3d3aed4c8b3ab |
|
MD5 | 7c5614fd329cbe9457696b28c76e5e96 |
|
BLAKE2b-256 | 7195cb070b297fc3e7d6e6dfab82a2a8ee638b8ba681410835ed1c1d5dad1741 |
Hashes for cumm_cu102-0.2.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7022e2714b043a3e07c00e3aafba124bc747eb89518fa2f777f558908a4f8edc |
|
MD5 | 7c660f6747afee32a7c2652427d22992 |
|
BLAKE2b-256 | c8f6dbef73629255709dbb308be4ad163f8d608ac87b08ec168f357312bd95f9 |