CUda Matrix Multiply library
Project description
cumm
CUda Matrix Multiply library.
cumm
is developed during learning of CUTLASS, which use too much c++ template and make code unmaintainable. So I develop pccm, use python as meta programming language, to replace c++ template meta programming.
Now pccm
become a foundational framework of cumm
and my other c++ project such as spconv.
cumm
also contains a python asyncio-based gemm simulator that share same meta program with CUDA code, enable gemm visualization and easy debug experience.
Install
Prebuilt
We offer python 3.7-3.10 and cuda 10.2/11.1/11.3/11.4 prebuilt binaries for linux (manylinux).
We offer python 3.7-3.10 and cuda 10.2/11.1/11.3/11.4 prebuilt binaries for windows 10/11.
We will offer prebuilts for CUDA versions supported by latest pytorch release. For example, pytorch 1.9 support cuda 10.2 and 11.1, so we support them too.
pip install cumm-cu102
for CUDA 10.2
pip install cumm-cu111
for CUDA 11.1
pip install cumm-cu113
for CUDA 11.3
pip install cumm-cu114
for CUDA 11.4
Build from source
Linux
- install build-essential, install CUDA
- run
export CUMM_DISABLE_JIT="1"
- run
python setup.py install
/pip install -e .
/python setup.py bdist_wheel
+pip install dists/xxx.whl
Windows 10/11
- install visual studio 2019 or newer. make sure C++ development package is installed. install CUDA
- set powershell script execution policy
- start a new powershell, run
tools/msvc_setup.ps1
- run
$Env:CUMM_DISABLE_JIT = "1"
- run
python setup.py install
/pip install -e .
/python setup.py bdist_wheel
+pip install dists/xxx.whl
Note
The work is done when the author is an employee at Tusimple.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for cumm_cu111-0.2.1-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5a3b74a4ec536cef496ce609a9010673f294f3ddb44d24f8adb80b5ecf360639 |
|
MD5 | c827f82de7bb04690b70a5abf51b973e |
|
BLAKE2b-256 | 4f0221d0c4892791914f513e9f0afbed4c65a4227a8d93051dd88a459da21059 |
Hashes for cumm_cu111-0.2.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b4ddf156df1c118750d83a03efb3889bf480def71eb1f8c4e756fc4ddcefe115 |
|
MD5 | 450300719c31e8451170eddcb06c9036 |
|
BLAKE2b-256 | ab25bb4cffc6f5f31e2cc41d11465707708d95020ca173ddc620b08d1af41a70 |
Hashes for cumm_cu111-0.2.1-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9fac0dab2f6b4110c4e7be80409296dc244ab3fb6cd2881047ac0141caeefe17 |
|
MD5 | 72510922d2f5f376a336ef0716961954 |
|
BLAKE2b-256 | 9641dec81ea9147420bbb0e37013e8a8aa87c43e899a641ae2fc9475e9f65316 |
Hashes for cumm_cu111-0.2.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 353cb0caefdd128ea829f5fb442885c5c63ad465400f481fa4cb6bf5fb70b888 |
|
MD5 | b93d5e9c92d02cb1dab4457f2c77a5bb |
|
BLAKE2b-256 | 70a939638fcdfeb5eab40dea21f00e4c6ec741db00cb4da90bdd2547a8bd8b7b |
Hashes for cumm_cu111-0.2.1-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 235df0d7f091511965fdbd5410205a7cd1a7389319496b7b7766eb3a871db839 |
|
MD5 | 73193b272c26008aec146d6aae4c0789 |
|
BLAKE2b-256 | 7158aa6afe437791387e34225dcc545cf0eb03f2ebca5ad9ba28a53b1f6ecbca |
Hashes for cumm_cu111-0.2.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d9ac47ec9b2101df12efcbc6f512d6bb7a466a3df3de0d01b4bf8bd6aea82cc7 |
|
MD5 | 0c8abe029458a2983f015268796730fe |
|
BLAKE2b-256 | 8e2882a94277fb215719eca5060daad6b7c5e16234ca68aaf159e1f86e197ae3 |
Hashes for cumm_cu111-0.2.1-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ab5a32704decd9625c67fe8a3aab982ddbc256dcc1b3970a977bce273da4f7a6 |
|
MD5 | 3b4f5d1e60633948428d0ef712308978 |
|
BLAKE2b-256 | 266bf981af2d79e235e9d0e2b7f07eb23aaf862e099d1bfe7b400ceabefc4150 |
Hashes for cumm_cu111-0.2.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b5d8acb2fac594e0ceae2c23bdc6df505d2307929baa9e2bbf82f74632777e71 |
|
MD5 | ca5e0e9f0f1bf9fa8fd9096cd6ada11d |
|
BLAKE2b-256 | 76ba7c7c8cc1327f69dcdc678cd1376c431f3b3b64cacf07bddfb2e04ef81073 |