Skip to main content

Improve Thinc's performance on Apple devices with native libraries

Project description

thinc-apple-ops

Make spaCy and Thinc up to 8 × faster on macOS by calling into Apple's native libraries.

⏳ Install

Make sure you have Xcode installed and then install with pip:

pip install thinc-apple-ops

🏫 Motivation

Matrix multiplication is one of the primary operations in machine learning. Since matrix multiplication is computationally expensive, using a fast matrix multiplication implementation can speed up training and prediction significantly.

Most linear algebra libraries provide matrix multiplication in the form of the standardized BLAS gemm functions. The work behind scences is done by a set of matrix multiplication kernels that are meticulously tuned for specific architectures. Matrix multiplication kernels use architecture-specific SIMD instructions for data-level parallism and can take factors such as cache sizes and intstruction latency into account. Thinc uses the BLIS linear algebra library, which provides optimized matrix multiplication kernels for most x86_64 and some ARM CPUs.

Recent Apple Silicon CPUs, such as the M-series used in Macs, differ from traditional x86_64 and ARM CPUs in that they have a separate matrix co-processor(s) called AMX. Since AMX is not well-documented, it is unclear how many AMX units Apple M CPUs have. It is certain that the (single) performance cluster of the M1 has an AMX unit and there is empirical evidence that both performance clusters of the M1 Pro/Max have an AMX unit.

Even though AMX units use a set of undocumented instructions, the units can be used through Apple's Accelerate linear algebra library. Since Accelerate implements the BLAS interface, it can be used as a replacement of the BLIS library that is used by Thinc. This is where the thinc-apple-ops package comes in. thinc-apple-ops extends the default Thinc ops, so that gemm matrix multiplication from Accelerate is used in place of the BLIS implementation of gemm. As a result, matrix multiplication in Thinc is performed on the fast AMX unit(s).

⏱ Benchmarks

Using thinc-apple-ops leads to large speedups in prediction and training on Apple Silicon Macs, as shown by the benchmarks below.

Prediction

This first benchmark compares prediction speed of the de_core_news_lg spaCy model between the M1 with and without thinc-apple-ops. Results for an Intel Mac Mini and AMD Ryzen 5900X are also provided for comparison. Results are in words per second. In this prediction benchmark, using thinc-apple-ops improves performance by 4.3 times.

CPU BLIS thinc-apple-ops Package power (Watt)
Mac Mini (M1) 6492 27676 5
MacBook Air Core i5 2020 9790 10983 9
Mac Mini Core i7 Late 2018 16364 14858 31
AMD Ryzen 5900X 22568 N/A 52

Training

In the second benchmark, we compare the training speed of the de_core_news_lg spaCy model (without NER). The results are in training iterations per second. Using thinc-apple-ops improves training time by 3.0 times.

CPU BLIS thinc-apple-ops Package power (Watt)
Mac Mini M1 2020 3.34 10.07 5
MacBook Air Core i5 2020 3.10 3.27 10
Mac Mini Core i7 Late 2018 4.71 4.93 32
AMD Ryzen 5900X 6.53 N/A 53

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

thinc_apple_ops-0.0.6.tar.gz (8.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

thinc_apple_ops-0.0.6-cp310-cp310-macosx_11_0_arm64.whl (65.1 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

thinc_apple_ops-0.0.6-cp310-cp310-macosx_10_9_x86_64.whl (70.9 kB view details)

Uploaded CPython 3.10macOS 10.9+ x86-64

thinc_apple_ops-0.0.6-cp39-cp39-macosx_11_0_arm64.whl (65.4 kB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

thinc_apple_ops-0.0.6-cp39-cp39-macosx_10_9_x86_64.whl (71.5 kB view details)

Uploaded CPython 3.9macOS 10.9+ x86-64

thinc_apple_ops-0.0.6-cp38-cp38-macosx_11_0_arm64.whl (64.0 kB view details)

Uploaded CPython 3.8macOS 11.0+ ARM64

thinc_apple_ops-0.0.6-cp38-cp38-macosx_10_9_x86_64.whl (69.4 kB view details)

Uploaded CPython 3.8macOS 10.9+ x86-64

thinc_apple_ops-0.0.6-cp37-cp37m-macosx_10_9_x86_64.whl (70.3 kB view details)

Uploaded CPython 3.7mmacOS 10.9+ x86-64

File details

Details for the file thinc_apple_ops-0.0.6.tar.gz.

File metadata

  • Download URL: thinc_apple_ops-0.0.6.tar.gz
  • Upload date:
  • Size: 8.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.7.9

File hashes

Hashes for thinc_apple_ops-0.0.6.tar.gz
Algorithm Hash digest
SHA256 03829807402a50a119a6005657f6e558091f283cbc1144e333763fdb29c37fb9
MD5 70f06eb1ecbbceb5dd06555be979dcbb
BLAKE2b-256 82581d6adfc3f7a46b94ee8998377ddf307c9c9d3fd62e56f46d0c348c2da085

See more details on using hashes here.

File details

Details for the file thinc_apple_ops-0.0.6-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for thinc_apple_ops-0.0.6-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c7654fa6afac3d09aec5353d5eeaf9c2bea136a9c3b6068bae2433a3082d9a5e
MD5 acac0802dba6c5cd5bb0f95b9dfffe69
BLAKE2b-256 eafedc881404769f4598db7dc25000ef4675c960c782522d1e7d7aff46a661fc

See more details on using hashes here.

File details

Details for the file thinc_apple_ops-0.0.6-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for thinc_apple_ops-0.0.6-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 1669faa11bf22cd066620224ab4282106c686c87620dc508401b88a81f15b0fe
MD5 dafca57b5d3fc37232e7e98da5ef0334
BLAKE2b-256 f210544be6879e39a566706a72cf14ebe28c6810cf4b447fdd0dce342775908f

See more details on using hashes here.

File details

Details for the file thinc_apple_ops-0.0.6-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for thinc_apple_ops-0.0.6-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b8bb9e2b68a5b83d90a278e262dd2cfc76e17a6671c5fb076db3832e056065b0
MD5 df637db9f5554b46e88e54807a0040ca
BLAKE2b-256 06c8ec87fcc59e2786ef67bd856f0a8830ec2f4d75c685f8546d39d6fa568aad

See more details on using hashes here.

File details

Details for the file thinc_apple_ops-0.0.6-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for thinc_apple_ops-0.0.6-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 dbef6d543393967a4b7dab0665f45f0eab7f44184bb92058dfa80ea93039e641
MD5 073e7dd3a044703a69c5cfb1a411d7e2
BLAKE2b-256 67d731b55279c219aef018abdba017c4fca003ad287ffa3cd0c2ea15b1eb125b

See more details on using hashes here.

File details

Details for the file thinc_apple_ops-0.0.6-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for thinc_apple_ops-0.0.6-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c1bd74f84c8cec7abe29b2f250cdae4c09a0f6a25ae90854ddfb0f4be256a982
MD5 a78a45b43cc1cf4f614c4b5a78817900
BLAKE2b-256 95e7225dd8412446bb0c0d09e6e04692ad26ebdbd94de352ad2da1dcb79f7557

See more details on using hashes here.

File details

Details for the file thinc_apple_ops-0.0.6-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for thinc_apple_ops-0.0.6-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 fabd49db2ef57746b01627f7dda87b1a0c2831178529a794ae3355b34d5695c9
MD5 edbd22c45ee3d033d563a20f05fafdc5
BLAKE2b-256 6bca3170b1a27fabb17dc08dfefb599393f07f1eaa21efd9eb53ebae4423b1a2

See more details on using hashes here.

File details

Details for the file thinc_apple_ops-0.0.6-cp37-cp37m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for thinc_apple_ops-0.0.6-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 12782fb0b35b2a91668693b1915c194897b4d235c23cef31571fef4a9efed752
MD5 2dce1de81e0f62be9bc72b0e1b2ec208
BLAKE2b-256 7c143102c08c3d402bcece99c222503955a8cf32feb832e7d32f30fda0740609

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page