Skip to main content

The Blis BLAS-like linear algebra library, as a self-contained C-extension.

Project description

Cython BLIS: Fast BLAS-like operations from Python and Cython, without the tears

This repository provides the Blis linear algebra routines as a self-contained Python C-extension.

Currently, we only supports single-threaded execution, as this is actually best for our workloads (ML inference).

Travis Appveyor pypi Version conda Python wheels

Installation

You can install the package via pip:

pip install blis

Wheels should be available, so installation should be fast. If you want to install from source and you're on Windows, you'll need to install LLVM.

Building BLIS for alternative architectures

The provided wheels should work on x86_86 architectures. Unfortunately we do not currently know a way to provide different wheels for alternative architectures, and we cannot provide a single binary that works everywhere. So if the wheel doesn't work for your CPU, you'll need to specify source distribution, and tell Blis your CPU architecture using the BLIS_ARCH environment variable.

a) Installing with generic arch support

BLIS_ARCH="generic" pip install spacy --no-binary blis

b) Building specific support

In order to compile Blis, cython-blis bundles makefile scripts for specific architectures, that are compiled by running the Blis build system and logging the commands. We do not yet have logs for every architecture, as there are some architectures we have not had access to.

See here for list of architectures. For example, here's how to build support for the ARM architecture cortexa57:

git clone https://github.com/explosion/cython-blis && cd cython-blis
git pull && git submodule init && git submodule update && git submodule status
python3 -m venv env3.6
source env3.6/bin/activate
pip install -r requirements.txt
./bin/generate-make-jsonl linux cortexa57
BLIS_ARCH="coretexa57" python setup.py build_ext --inplace
BLIS_ARCH="cortexa57" python setup.py bdist_wheel

Fingers crossed, this will build you a wheel that supports your platform. You could then submit a PR with the blis/_src/make/linux-cortexa57.jsonl and blis/_src/include/linux-cortexa57/blis.h files so that you can run:

BLIS_ARCH=cortexa57 pip install spacy --no-binary=blis

Running the benchmark

After installation, run a small matrix multiplication benchmark:

$ export OMP_NUM_THREADS=1 # Tell Numpy to only use one thread.
$ python -m blis.benchmark
Setting up data nO=384 nI=384 batch_size=2000. Running 1000 iterations
Blis...
Total: 11032014.6484
7.35 seconds
Numpy (Openblas)...
Total: 11032016.6016
16.81 seconds
Blis einsum ab,cb->ca
8.10 seconds
Numpy einsum ab,cb->ca
Total: 5510596.19141
83.18 seconds

The low numpy.einsum performance is expected, but the low numpy.dot performance is surprising. Linking numpy against MKL gives better performance:

Numpy (mkl_rt) gemm...
Total: 11032011.71875
5.21 seconds

These figures refer to performance on a Dell XPS 13 i7-7500U. Running the same benchmark on a 2015 MacBook Air gives:

Blis...
Total: 11032014.6484
8.89 seconds
Numpy (Accelerate)...
Total: 11032012.6953
6.68 seconds

Clearly the Dell's numpy+OpenBLAS performance is the outlier, so it's likely something has gone wrong in the compilation and architecture detection.

Usage

Two APIs are provided: a high-level Python API, and direct Cython access. The best part of the Python API is the einsum function, which works like numpy's, but with some restrictions that allow a direct mapping to Blis routines. Example usage:

from blis.py import einsum
from numpy import ndarray, zeros

dim_a = 500
dim_b = 128
dim_c = 300
arr1 = ndarray((dim_a, dim_b))
arr2 = ndarray((dim_b, dim_c))
out = zeros((dim_a, dim_c))

einsum('ab,bc->ac', arr1, arr2, out=out)
# Change dimension order of output
out = einsum('ab,bc->ca', arr1, arr2)
assert out.shape == (dim_a, dim_c)
# Matrix vector product, with transposed output
arr2 = ndarray((dim_b,))
out = einsum('ab,b->ba', arr1, arr2)
assert out.shape == (dim_b, dim_a)

The Einstein summation format is really awesome, so it's always been disappointing that it's so much slower than equivalent calls to tensordot in numpy. The blis.einsum function gives up the numpy version's generality, so that calls can be easily mapped to Blis:

  • Only two input tensors
  • Maximum two dimensions
  • Dimensions must be labelled a, b and c
  • The first argument's dimensions must be 'a' (for 1d inputs) or 'ab' (for 2d inputs).

With these restrictions, there are ony 15 valid combinations – which correspond to all the things you would otherwise do with the gemm, gemv, ger and axpy functions. You can therefore forget about all the other functions and just use the einsum. Here are the valid einsum strings, the calls they correspond to, and the numpy equivalents:

Equation Maps to Numpy
'a,a->a' axpy(A, B) A+B
'a,b->ab' ger(A, B) outer(A, B)
'a,b->ba' ger(B, A) outer(B, A)
'ab,a->ab' batch_axpy(A, B) A*B
'ab,a->ba' batch_axpy(A, B, trans1=True) (A*B).T
'ab,b->a' gemv(A, B) A*B
'ab,a->b' gemv(A, B, trans1=True) A.T*B
'ab,ac->cb' gemm(B, A, trans1=True, trans2=True) dot(B.T, A)
'ab,ac->bc' gemm(A, B, trans1=True, trans2=False) dot(A.T, B)
'ab,bc->ac' gemm(A, B, trans1=False, trans2=False) dot(A, B)
'ab,bc->ca' gemm(B, A, trans1=False, trans2=True) dot(B.T, A.T)
'ab,ca->bc' gemm(A, B, trans1=True, trans2=True) dot(B, A.T)
'ab,ca->cb' gemm(B, A, trans1=False, trans2=False) dot(B, A)
'ab,cb->ac' gemm(A, B, trans1=False, trans2=True) dot(A.T, B.T)
'ab,cb->ca' gemm(B, A, trans1=False, trans2=True) dot(B, A.T)

We also provide fused-type, nogil Cython bindings to the underlying Blis linear algebra library. Fused types are a simple template mechanism, allowing just a touch of compile-time generic programming:

cimport blis.cy
A = <float*>calloc(nN * nI, sizeof(float))
B = <float*>calloc(nO * nI, sizeof(float))
C = <float*>calloc(nr_b0 * nr_b1, sizeof(float))
blis.cy.gemm(blis.cy.NO_TRANSPOSE, blis.cy.NO_TRANSPOSE,
             nO, nI, nN,
             1.0, A, nI, 1, B, nO, 1,
             1.0, C, nO, 1)

Bindings have been added as we've needed them. Please submit pull requests if the library is missing some functions you require.

Development

To build the source package, you should run the following command:

./bin/copy-source-files.sh

This populates the blis/_src folder for the various architectures, using the flame-blis submodule.

Updating the build files

In order to compile the Blis sources, we use jsonl files that provide the explicit compiler flags. We build these jsonl files by running Blis's build system, and then converting the log. This avoids us having to replicate the build system within Python: we just use the jsonl to make a bunch of subprocess calls. To support a new OS/architecture combination, we have to provide the jsonl file and the header.

Linux

The Linux build files need to be produced from within the manylinux1 docker container, so that they will be compatible with the wheel building process.

First, install docker. Then do the following to start the container:

sudo docker run -it quay.io/pypa/manylinux1_x86_64:latest

Once within the container, the following commands should check out the repo and build the jsonl files for the generic arch:

mkdir /usr/local/repos
cd /usr/local/repos
git clone https://github.com/explosion/cython-blis && cd cython-blis
git pull && git submodule init && git submodule update && git submodule
status
/opt/python/cp36-cp36m/bin/python -m venv env3.6
source env3.6/bin/activate
pip install -r requirements.txt
./bin/generate-make-jsonl linux generic --export
BLIS_ARCH=generic python setup.py build_ext --inplace
# N.B.: don't copy to /tmp, docker cp doesn't work from there.
cp blis/_src/include/linux-generic/blis.h /linux-generic-blis.h
cp blis/_src/make/linux-generic.jsonl /

Then from a new terminal, retrieve the two files we need out of the container:

sudo docker ps -l # Get the container ID
# When I'm in Vagrant, I need to go via cat -- but then I end up with dummy
# lines at the top and bottom. Sigh. If you don't have that problem and
# sudo docker cp just works, just copy the file.
sudo docker cp aa9d42588791:/linux-generic-blis.h - | cat > linux-generic-blis.h
sudo docker cp aa9d42588791:/linux-generic.jsonl - | cat > linux-generic.jsonl

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

blis-0.4.0.tar.gz (1.8 MB view details)

Uploaded Source

Built Distributions

blis-0.4.0-cp37-cp37m-win_amd64.whl (5.0 MB view details)

Uploaded CPython 3.7m Windows x86-64

blis-0.4.0-cp37-cp37m-manylinux1_x86_64.whl (3.7 MB view details)

Uploaded CPython 3.7m

blis-0.4.0-cp37-cp37m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (4.0 MB view details)

Uploaded CPython 3.7m macOS 10.10+ Intel (x86-64, i386) macOS 10.10+ x86-64 macOS 10.6+ Intel (x86-64, i386) macOS 10.9+ Intel (x86-64, i386) macOS 10.9+ x86-64

blis-0.4.0-cp36-cp36m-win_amd64.whl (5.0 MB view details)

Uploaded CPython 3.6m Windows x86-64

blis-0.4.0-cp36-cp36m-manylinux1_x86_64.whl (3.7 MB view details)

Uploaded CPython 3.6m

blis-0.4.0-cp35-cp35m-win_amd64.whl (5.0 MB view details)

Uploaded CPython 3.5m Windows x86-64

blis-0.4.0-cp35-cp35m-manylinux1_x86_64.whl (3.7 MB view details)

Uploaded CPython 3.5m

blis-0.4.0-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (4.0 MB view details)

Uploaded CPython 3.5m macOS 10.10+ Intel (x86-64, i386) macOS 10.10+ x86-64 macOS 10.6+ Intel (x86-64, i386) macOS 10.9+ Intel (x86-64, i386) macOS 10.9+ x86-64

blis-0.4.0-cp27-cp27mu-manylinux1_x86_64.whl (3.7 MB view details)

Uploaded CPython 2.7mu

blis-0.4.0-cp27-cp27m-manylinux1_x86_64.whl (3.7 MB view details)

Uploaded CPython 2.7m

blis-0.4.0-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (4.0 MB view details)

Uploaded CPython 2.7m macOS 10.10+ Intel (x86-64, i386) macOS 10.10+ x86-64 macOS 10.6+ Intel (x86-64, i386) macOS 10.9+ Intel (x86-64, i386) macOS 10.9+ x86-64

File details

Details for the file blis-0.4.0.tar.gz.

File metadata

  • Download URL: blis-0.4.0.tar.gz
  • Upload date:
  • Size: 1.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/39.0.1 requests-toolbelt/0.9.1 tqdm/4.33.0 CPython/3.6.6

File hashes

Hashes for blis-0.4.0.tar.gz
Algorithm Hash digest
SHA256 d5dfb6adc2ec93b8f32006f8eb1f85e569ace0ef998b4e7bb6d9676ebe0010ce
MD5 361117d81de34777ea8f70f5c57450be
BLAKE2b-256 475313af70296735ff51ec130ae980612299ad2392ccbcd550ce71b644e5bbe5

See more details on using hashes here.

File details

Details for the file blis-0.4.0-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: blis-0.4.0-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 5.0 MB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/39.0.1 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.6

File hashes

Hashes for blis-0.4.0-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 b48c5f8e189412b449c70d25288f7ddd240b9d415c89834d9b90138bd7171722
MD5 bd083e2e59bc17c00dd74fa28f951c3e
BLAKE2b-256 85b6271698420407485954fc3bcc6913d7d72ce002d4484ab0dec35a71cd4636

See more details on using hashes here.

File details

Details for the file blis-0.4.0-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

  • Download URL: blis-0.4.0-cp37-cp37m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 3.7 MB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/39.0.1 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.6

File hashes

Hashes for blis-0.4.0-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 be31c42526c48c35972d356e95548100752736378a1ed366999d60fbfb9e64d8
MD5 73c3f382cb0854e8e26a386ed9d0a6ed
BLAKE2b-256 9e3699aba8ed2af9aef9a728256e1175a4d46cd0a7a464b7685beb9eff920831

See more details on using hashes here.

File details

Details for the file blis-0.4.0-cp37-cp37m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for blis-0.4.0-cp37-cp37m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 557090cab912782470b431b576576e97a93bcb6fc5da8d7a34152eaa5a28a16a
MD5 94d5ca5ef4e919ffbb0257aa1acb27ed
BLAKE2b-256 9814dae8ef64a200c544d5cbf8296aa54e652337b97512440b96f9024c383922

See more details on using hashes here.

File details

Details for the file blis-0.4.0-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: blis-0.4.0-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 5.0 MB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/39.0.1 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.6

File hashes

Hashes for blis-0.4.0-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 194df8bda5429d5f71eded6371781504b80c4f2e478bf6747ed4c6d20483d0b7
MD5 9e92ba105af7603da6940de23ae346e7
BLAKE2b-256 f74003c2dfcc0cf34399e9f68ac886aefe4d70eeada4e3dbc346597b88e6507b

See more details on using hashes here.

File details

Details for the file blis-0.4.0-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: blis-0.4.0-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 3.7 MB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/39.0.1 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.6

File hashes

Hashes for blis-0.4.0-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 bde1322038730220307cf979b24a880c067a9ab2d31012b958b04adef6f6169e
MD5 943a1275aeb894317f695a6863f99e75
BLAKE2b-256 1411bfb30ef0009bbcc6313317c7482a9171d86c8680912e578345d4fdac5c21

See more details on using hashes here.

File details

Details for the file blis-0.4.0-cp35-cp35m-win_amd64.whl.

File metadata

  • Download URL: blis-0.4.0-cp35-cp35m-win_amd64.whl
  • Upload date:
  • Size: 5.0 MB
  • Tags: CPython 3.5m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/39.0.1 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.6

File hashes

Hashes for blis-0.4.0-cp35-cp35m-win_amd64.whl
Algorithm Hash digest
SHA256 753cdecc6fe82525ffe2b84bba7a6ce25f1edeeb86b124cc62a0738bcf1f60c7
MD5 bd324db832a36b5b243bb91884bbd556
BLAKE2b-256 a4200aad8e16c3d70bef1419b0cf3aacd9eabd03049305ee7feb8ea92c4dd76c

See more details on using hashes here.

File details

Details for the file blis-0.4.0-cp35-cp35m-manylinux1_x86_64.whl.

File metadata

  • Download URL: blis-0.4.0-cp35-cp35m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 3.7 MB
  • Tags: CPython 3.5m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/39.0.1 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.6

File hashes

Hashes for blis-0.4.0-cp35-cp35m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 a26c41a1710c7a7535cf1d9b2100037cd2fa9b664717bf8c0734f93a4990fcf3
MD5 bd80eda3e9c393d931c5eb1973f681d4
BLAKE2b-256 1706e2069aabedf859ef9de4a3eba11fbd364fb700ac80d3fd13e905601e6c03

See more details on using hashes here.

File details

Details for the file blis-0.4.0-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for blis-0.4.0-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 ba42c23dc7110d8cc353ba2018ab249df2fda990ca58486da5f65424dc1365d0
MD5 9cf22c21ed7a4ca6fe49a70a91f3fa88
BLAKE2b-256 1765184542056245c7cb9aa217360c0ab5a6329af0cb5eee61b909939dac278a

See more details on using hashes here.

File details

Details for the file blis-0.4.0-cp27-cp27mu-manylinux1_x86_64.whl.

File metadata

  • Download URL: blis-0.4.0-cp27-cp27mu-manylinux1_x86_64.whl
  • Upload date:
  • Size: 3.7 MB
  • Tags: CPython 2.7mu
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/39.0.1 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.6

File hashes

Hashes for blis-0.4.0-cp27-cp27mu-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 d2b3a9dd6d48c1c03f5a34e7cb70bbe128048f55f3c5ae5f9895777b95b5ba8e
MD5 551ec827f1fd16b33da86171d5d80d90
BLAKE2b-256 e7e11c814c26b48e0e95f0fd4d2e628c49b4ab7bd72d6c392a41b1ca08b8085d

See more details on using hashes here.

File details

Details for the file blis-0.4.0-cp27-cp27m-manylinux1_x86_64.whl.

File metadata

  • Download URL: blis-0.4.0-cp27-cp27m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 3.7 MB
  • Tags: CPython 2.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/39.0.1 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.6

File hashes

Hashes for blis-0.4.0-cp27-cp27m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 24fec7901cc58a213ac1289980c47ec51436902308d7678be5a6c08b74f9ac28
MD5 14e4564345026cf9597d549695d226de
BLAKE2b-256 f4498d1623e05bfd69a5524792a21281ee102a7bbae3ee9c70d1d4c3d676c320

See more details on using hashes here.

File details

Details for the file blis-0.4.0-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for blis-0.4.0-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 2715777b29b285219d962c983aaf718a29b8403b3d3d683841a7e1dbc151b674
MD5 9b2d49a08369742c6aab17ccc1f902ec
BLAKE2b-256 e1a5771fea3c29cea83b4bab3298ffa2885900e9600f6bbd7180702263d47e72

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page