Improve Thinc's performance on Apple devices with native libraries

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

thinc-apple-ops

Make spaCy and Thinc up to 8 × faster on macOS by calling into Apple's native libraries.

⏳ Install

Make sure you have Xcode installed and then install with pip:

pip install thinc-apple-ops

🏫 Motivation

Matrix multiplication is one of the primary operations in machine learning. Since matrix multiplication is computationally expensive, using a fast matrix multiplication implementation can speed up training and prediction significantly.

Most linear algebra libraries provide matrix multiplication in the form of the standardized BLAS gemm functions. The work behind scences is done by a set of matrix multiplication kernels that are meticulously tuned for specific architectures. Matrix multiplication kernels use architecture-specific SIMD instructions for data-level parallism and can take factors such as cache sizes and intstruction latency into account. Thinc uses the BLIS linear algebra library, which provides optimized matrix multiplication kernels for most x86_64 and some ARM CPUs.

Recent Apple Silicon CPUs, such as the M-series used in Macs, differ from traditional x86_64 and ARM CPUs in that they have a separate matrix co-processor(s) called AMX. Since AMX is not well-documented, it is unclear how many AMX units Apple M CPUs have. It is certain that the (single) performance cluster of the M1 has an AMX unit and there is empirical evidence that both performance clusters of the M1 Pro/Max have an AMX unit.

Even though AMX units use a set of undocumented instructions, the units can be used through Apple's Accelerate linear algebra library. Since Accelerate implements the BLAS interface, it can be used as a replacement of the BLIS library that is used by Thinc. This is where the thinc-apple-ops package comes in. thinc-apple-ops extends the default Thinc ops, so that gemm matrix multiplication from Accelerate is used in place of the BLIS implementation of gemm. As a result, matrix multiplication in Thinc is performed on the fast AMX unit(s).

⏱ Benchmarks

Using thinc-apple-ops leads to large speedups in prediction and training on Apple Silicon Macs, as shown by the benchmarks below.

Prediction

This first benchmark compares prediction speed of the de_core_news_lg spaCy model between the M1 with and without thinc-apple-ops. Results for an Intel Mac Mini and AMD Ryzen 5900X are also provided for comparison. Results are in words per second. In this prediction benchmark, using thinc-apple-ops improves performance by 4.3 times.

CPU	BLIS	thinc-apple-ops	Package power (Watt)
Mac Mini (M1)	6492	27676	5
MacBook Air Core i5 2020	9790	10983	9
Mac Mini Core i7 Late 2018	16364	14858	31
AMD Ryzen 5900X	22568	N/A	52

Training

In the second benchmark, we compare the training speed of the de_core_news_lg spaCy model (without NER). The results are in training iterations per second. Using thinc-apple-ops improves training time by 3.0 times.

CPU	BLIS	thinc-apple-ops	Package power (Watt)
Mac Mini M1 2020	3.34	10.07	5
MacBook Air Core i5 2020	3.10	3.27	10
Mac Mini Core i7 Late 2018	4.71	4.93	32
AMD Ryzen 5900X	6.53	N/A	53

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.1.5

Apr 18, 2024

0.1.4

Sep 22, 2023

0.1.3

Dec 16, 2022

0.1.2

Oct 17, 2022

0.1.1

Sep 27, 2022

0.1.0

Jul 19, 2022

0.1.0.dev1 pre-release

Jun 16, 2022

0.1.0.dev0 pre-release

Jun 3, 2022

0.0.8

Sep 27, 2022

0.0.7

May 27, 2022

0.0.6 yanked

May 18, 2022

0.0.5 yanked

Nov 5, 2021

0.0.4 yanked

Oct 5, 2021

0.0.3 yanked

Sep 28, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

thinc_apple_ops-0.1.5.tar.gz (62.9 kB view hashes)

Uploaded Apr 18, 2024 Source

Built Distributions

thinc_apple_ops-0.1.5-cp312-cp312-macosx_11_0_arm64.whl (156.5 kB view hashes)

Uploaded Apr 18, 2024 CPython 3.12 macOS 11.0+ ARM64

thinc_apple_ops-0.1.5-cp312-cp312-macosx_10_9_x86_64.whl (163.5 kB view hashes)

Uploaded Apr 18, 2024 CPython 3.12 macOS 10.9+ x86-64

thinc_apple_ops-0.1.5-cp311-cp311-macosx_11_0_arm64.whl (155.5 kB view hashes)

Uploaded Apr 18, 2024 CPython 3.11 macOS 11.0+ ARM64

thinc_apple_ops-0.1.5-cp311-cp311-macosx_10_9_x86_64.whl (162.3 kB view hashes)

Uploaded Apr 18, 2024 CPython 3.11 macOS 10.9+ x86-64

thinc_apple_ops-0.1.5-cp310-cp310-macosx_11_0_arm64.whl (155.8 kB view hashes)

Uploaded Apr 18, 2024 CPython 3.10 macOS 11.0+ ARM64

thinc_apple_ops-0.1.5-cp310-cp310-macosx_10_9_x86_64.whl (162.7 kB view hashes)

Uploaded Apr 18, 2024 CPython 3.10 macOS 10.9+ x86-64

thinc_apple_ops-0.1.5-cp39-cp39-macosx_11_0_arm64.whl (156.6 kB view hashes)

Uploaded Apr 18, 2024 CPython 3.9 macOS 11.0+ ARM64

thinc_apple_ops-0.1.5-cp39-cp39-macosx_10_9_x86_64.whl (163.2 kB view hashes)

Uploaded Apr 18, 2024 CPython 3.9 macOS 10.9+ x86-64

thinc_apple_ops-0.1.5-cp38-cp38-macosx_11_0_arm64.whl (156.4 kB view hashes)

Uploaded Apr 18, 2024 CPython 3.8 macOS 11.0+ ARM64

thinc_apple_ops-0.1.5-cp38-cp38-macosx_10_9_x86_64.whl (162.9 kB view hashes)

Uploaded Apr 18, 2024 CPython 3.8 macOS 10.9+ x86-64

thinc_apple_ops-0.1.5-cp37-cp37m-macosx_10_9_x86_64.whl (163.7 kB view hashes)

Uploaded Apr 18, 2024 CPython 3.7m macOS 10.9+ x86-64

Hashes for thinc_apple_ops-0.1.5.tar.gz

Hashes for thinc_apple_ops-0.1.5.tar.gz
Algorithm	Hash digest
SHA256	`b75fac137fde8131c6c15aea8728fdc60a54ea7ef8c465a47fb5e26f39bbae58`
MD5	`41af4ce3201ccc6b5b70a030cd12d880`
BLAKE2b-256	`fcfd871e21a2fd8e1ee7297ab9d368c61544d9c08f421ce9f79dbf5a61be5561`

Hashes for thinc_apple_ops-0.1.5-cp312-cp312-macosx_11_0_arm64.whl

Hashes for thinc_apple_ops-0.1.5-cp312-cp312-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`4a44763801bb274f8c3b87586b929533fe9b76b470eb323a7cf1bd1900c9ca32`
MD5	`81b86a04fe326f54ba10c9cd01ac0fd6`
BLAKE2b-256	`ce14f6d56a9d9c1651949e7045ab950c621cd4c4ccc07da957b7e76a315da9ea`

Hashes for thinc_apple_ops-0.1.5-cp312-cp312-macosx_10_9_x86_64.whl

Hashes for thinc_apple_ops-0.1.5-cp312-cp312-macosx_10_9_x86_64.whl
Algorithm	Hash digest
SHA256	`058830c85abff886f0e7ebeb57ff8de3a680e883430a16935ed0f87ea2a95d9e`
MD5	`940ae47b38e7c146b8ac6ae9e49e2a27`
BLAKE2b-256	`557773a9c4602efeb52e763141017419cb67d9622f10fbc4cad624d2412e7e9c`

Hashes for thinc_apple_ops-0.1.5-cp311-cp311-macosx_11_0_arm64.whl

Hashes for thinc_apple_ops-0.1.5-cp311-cp311-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`b712af5bcc7f037ed97c60b4bf01b2ced21e282c7d76f8ea15868d792ff9be1a`
MD5	`1c3630f59ccb53d9ae82868d916d3e9d`
BLAKE2b-256	`a4c6b74c61aaf45db6ef14d4bef3a64b64601abfa52d4e1f75b87815eb483af1`

Hashes for thinc_apple_ops-0.1.5-cp311-cp311-macosx_10_9_x86_64.whl

Hashes for thinc_apple_ops-0.1.5-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm	Hash digest
SHA256	`cdee7373f3f84e7a08edb5fe300c79e81543ef59aad2e004e17d13b246fbdd78`
MD5	`0e00932178dd570cbd8b6c72a6ad4dc0`
BLAKE2b-256	`6d34c9f930009c1ef4999ddeaff7158e2628b030812cdf220c9ae9313a84a1e9`

Hashes for thinc_apple_ops-0.1.5-cp310-cp310-macosx_11_0_arm64.whl

Hashes for thinc_apple_ops-0.1.5-cp310-cp310-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`f894bf1d239bea1aa51d99deb3543c04d510425d0b853d782964564a33cfd7ed`
MD5	`0c1e5046b364a2c5add8edbaa19161dc`
BLAKE2b-256	`f20c247681d34df62e986148e78991fdf62045dc7b7349f860628669b2cfca7a`

Hashes for thinc_apple_ops-0.1.5-cp310-cp310-macosx_10_9_x86_64.whl

Hashes for thinc_apple_ops-0.1.5-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm	Hash digest
SHA256	`7c0e8a10bb33174966006b6f3fc37ad71223040e9b420bd7b94597a4de930fb7`
MD5	`b907150411ee78ed669b850eea6eed1e`
BLAKE2b-256	`dcfbe10da0f3fd62b6f20eca0ceb581f084a650ec1915f45d7e93d4a4ebcf8ca`

Hashes for thinc_apple_ops-0.1.5-cp39-cp39-macosx_11_0_arm64.whl

Hashes for thinc_apple_ops-0.1.5-cp39-cp39-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`10410d844d7658ba6d6e224d2e97e0b9984d2875ec2e512c6152872f98aaaef7`
MD5	`64ddebc9c60457fd8968aa73d1fb60ab`
BLAKE2b-256	`d9315a73f94b8c534d14b236ce657174b9596605eb60763b1197e5d54da71951`

Hashes for thinc_apple_ops-0.1.5-cp39-cp39-macosx_10_9_x86_64.whl

Hashes for thinc_apple_ops-0.1.5-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm	Hash digest
SHA256	`966eee274bbb1db760fa5060ca0c63fd7f5e22b4fd8edba7fbe69f4ca5664908`
MD5	`deba4c5e30bd3610c9251128fe0f3c39`
BLAKE2b-256	`02bf88f8332148d31a8d141176107168acf08426998864075a4a51ce6b7616fd`

Hashes for thinc_apple_ops-0.1.5-cp38-cp38-macosx_11_0_arm64.whl

Hashes for thinc_apple_ops-0.1.5-cp38-cp38-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`0f516234497b6bc36ccb06a268384988165c54c8b07f55f11e2c1b24c27c39a8`
MD5	`e4460e617de8a491614f912c8f3cec9b`
BLAKE2b-256	`e0fa8145679e283845f9024eafa49fdbbb406ccfdcde5b83b3d3e841140361ac`

Hashes for thinc_apple_ops-0.1.5-cp38-cp38-macosx_10_9_x86_64.whl

Hashes for thinc_apple_ops-0.1.5-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm	Hash digest
SHA256	`d147472b6b808ec28060f1325b621dc06603188333ee9a605e4c2dfda7b559bc`
MD5	`c13976f7f181153b7a6a60eb9d9635cc`
BLAKE2b-256	`123c278466e9e70423fc404ebc06da0a8b0997a68caadc3a2338d94e8a5613f7`

Hashes for thinc_apple_ops-0.1.5-cp37-cp37m-macosx_10_9_x86_64.whl

Hashes for thinc_apple_ops-0.1.5-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm	Hash digest
SHA256	`913fc21d9d8c3b1159fc2c1824ca6ca3bb0cdb0119d4d6cf4d470a00002d810a`
MD5	`2c48f1b85b365dbeaefe97c8d4abd277`
BLAKE2b-256	`c1345bb8e71823c2db34f18aec77c5b60aaf546d1ff996d6b8a31017b62f2886`