CUda Matrix Multiply library
Project description
cumm
CUda Matrix Multiply library.
cumm
is developed during learning of CUTLASS, which use too much c++ template and make code unmaintainable. So I develop pccm, use python as meta programming language, to replace c++ template meta programming.
Now pccm
become a foundational framework of cumm
and my other c++ project such as spconv.
cumm
also contains a python asyncio-based gemm simulator that share same meta program with CUDA code, enable gemm visualization and easy debug experience.
Install
Prebuilt
We offer python 3.7-3.10 and cuda 10.2/11.1/11.3/11.4 prebuilt binaries for linux (manylinux).
We offer python 3.7-3.10 and cuda 10.2/11.1/11.3/11.4 prebuilt binaries for windows 10/11.
We will offer prebuilts for CUDA versions supported by latest pytorch release. For example, pytorch 1.9 support cuda 10.2 and 11.1, so we support them too.
pip install cumm-cu102
for CUDA 10.2
pip install cumm-cu111
for CUDA 11.1
pip install cumm-cu113
for CUDA 11.3
pip install cumm-cu114
for CUDA 11.4
Build from source
Linux
- install build-essential, install CUDA
- run
export CUMM_DISABLE_JIT="1"
- run
python setup.py install
/pip install -e .
/python setup.py bdist_wheel
+pip install dists/xxx.whl
Windows 10/11
- install visual studio 2019 or newer. make sure C++ development package is installed. install CUDA
- set powershell script execution policy
- start a new powershell, run
tools/msvc_setup.ps1
- run
$Env:CUMM_DISABLE_JIT = "1"
- run
python setup.py install
/pip install -e .
/python setup.py bdist_wheel
+pip install dists/xxx.whl
Note
The work is done when the author is an employee at Tusimple.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for cumm_cu114-0.2.1-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 547835537233d22b1a2dcc69ae01d4b538b50d408dac80d3cfffa86e38562b11 |
|
MD5 | f5b395a190ef551678df2d3f696a4c6c |
|
BLAKE2b-256 | 08d3729743151cf4ed4973fc4d4c3c495700d468b44c25110b346dde32c2e2ce |
Hashes for cumm_cu114-0.2.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9ecbea1f7a73d57343b9fa97e26d84f96774934ccb98a8431d5a430cd889f2c4 |
|
MD5 | 0c5ff87f4e59e1993d8a89451b7f9894 |
|
BLAKE2b-256 | 272d0a1b505db67db173cc5704e1aa9f98e6856de5aa378085f5e473b99a3236 |
Hashes for cumm_cu114-0.2.1-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0b9e99a59299480a6059d98334c7141d245b9f81902711ad610c06e7f2714ac9 |
|
MD5 | 2de9a729af39f6c7a2308e6cbe98be65 |
|
BLAKE2b-256 | 63965840216321acc9395d05d7901f7a12c60f5b4d18c2fb3632d6856b806742 |
Hashes for cumm_cu114-0.2.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 350ab41aa1a81a8a34fb01cdeba03c3732c1f91aebf460b6fa7425af242bddb4 |
|
MD5 | c274b44aafc9235023491fe0319f14a6 |
|
BLAKE2b-256 | 36035e2161354aad38ae685f17b8f49456d864af23b5775480516a9637398c60 |
Hashes for cumm_cu114-0.2.1-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d172d950f040271863518c712e4b9bcbe504c865940ca7d7da435d04c6bb73a6 |
|
MD5 | 493e376156003c5861f9de8784cc8111 |
|
BLAKE2b-256 | 02d3ab7e44d2e4a8e42555c03542581985faee3de3032caaea15fc09f73c1c76 |
Hashes for cumm_cu114-0.2.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7d544de02cbeb24b3eb38abd146a8b88114fffb9bf409161f574629d0c118a14 |
|
MD5 | 028b159eef5ac48d8f41ec0b5cc9b5b0 |
|
BLAKE2b-256 | a5d23b1d1edf232570d57473b77d9eff2b4f8c078c499fcd0c63cc28a61d9dfe |
Hashes for cumm_cu114-0.2.1-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2fad275a68cd2536c89643812aac13f6f3cb199f025faf87abd711177835ff0b |
|
MD5 | 98dca4ed562f197b3265102dd68efbff |
|
BLAKE2b-256 | 8d40298e191a55e6939c026dbd43302cd729f061c4478a58a959aec9e8264654 |
Hashes for cumm_cu114-0.2.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1876c1c8645cba62ac6c274956186a61a1da51b0126f45290eb64b51ea8b8f7a |
|
MD5 | 27035cd43b84738d2ab1efb3cc8ed229 |
|
BLAKE2b-256 | 287d9cbff69e770a3b4cc43ff5c829215e222074f1ba52c8ec56d4effa5f7e55 |