GPU-accelerated spherical Bessel functions for Apple Silicon using MLX
Project description
mlx-bessel
GPU-accelerated spherical Bessel functions j_l(x) and j_l'(x) for Apple Silicon, using MLX.
What it does
Evaluates spherical Bessel functions of the first kind and their derivatives on Apple GPU via piecewise Chebyshev interpolation. The table is built once (using a hybrid forward-recurrence + scipy strategy on CPU), then stored as GPU tensors for fast repeated evaluation.
Installation
pip install mlx-bessel
Requires macOS with Apple Silicon (M1/M2/M3/M4).
Quick start
import numpy as np
from mlx_bessel import BesselTable
ells = np.arange(0, 2001) # multipole values
table = BesselTable(ells, x_max=5500) # build table (~2s for 500 ells)
x = np.linspace(1.0, 5000.0, 10000) # evaluation points
jl = table.eval_jl(x) # shape (2001, 10000), on GPU
jl, jlp = table.eval_jl_jlp(x) # j_l and j_l' together
Results are returned as mlx.core.array. Convert to numpy with np.array(jl).
Performance
Benchmarked on Apple M1 Max. Median of 5 runs, warm-up excluded.
| N_ell | N_x | scipy | GPU eval | Speedup (eval) | Speedup (incl. build) |
|---|---|---|---|---|---|
| 100 | 1000 | 0.05 s | 0.001 s | 34x | 0.8x |
| 100 | 5000 | 0.22 s | 0.004 s | 57x | 3.8x |
| 200 | 5000 | 1.20 s | 0.007 s | 174x | 3.9x |
| 200 | 10000 | 2.35 s | 0.013 s | 181x | 7.7x |
| 500 | 5000 | 8.95 s | 0.016 s | 557x | 1.7x |
| 500 | 10000 | 17.82 s | 0.031 s | 567x | 3.4x |
| 525 | 10000 | 19.63 s | 0.034 s | 579x | 3.1x |
The table build is a one-time cost (~0.05 s for 100 ells, ~5 s for 500 ells). Once built, subsequent evaluations at any x-array use GPU-only and achieve significant speedups, reaching over 500x for large problems.
Accuracy
Tested against scipy.special.spherical_jn across l = 0..2000, x = 0.5..5000 (155 sampled ells, 5000 x-points):
| Metric | Value |
|---|---|
| Max absolute error | 6.3e-07 |
| Median absolute error | 1.1e-08 |
| Max relative error (|j_l| > 1e-5) | 1.1e-02 |
| Median relative error | 5.1e-05 |
| P99 relative error | 2.4e-03 |
Float32 GPU precision limits relative accuracy near zero-crossings of j_l. For physically relevant values (|j_l| > 1e-5), the relative error is below 1.2%.
Method
- Piecewise segments: [0, x_max] is divided into segments of width ~80.
- Chebyshev nodes: 64 Chebyshev nodes per segment.
- Hybrid table build (CPU):
- Forward recurrence for the stable regime (x > 1.5l)
- scipy for the transition zone (x ~ l, ~14% of node pairs)
- Zero for the evanescent regime (x << l)
- DCT to coefficients: Discrete cosine transform converts node values to Chebyshev expansion coefficients.
- GPU evaluation: Segment lookup + Chebyshev basis matrix multiply, fully vectorized on GPU.
Running benchmarks
python -m mlx_bessel.benchmark
Running tests
pip install pytest scipy
pytest tests/ -v
Author
Sheng-Kai Huang (akai@fawstudio.com)
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mlx_bessel-0.1.0.tar.gz.
File metadata
- Download URL: mlx_bessel-0.1.0.tar.gz
- Upload date:
- Size: 10.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4d1f23bb1bacd6fd315e95872f42c43e5557a1a92b75d0b30619ce266fe91ad5
|
|
| MD5 |
5f690bf43a49f58f4c0d36b349478c51
|
|
| BLAKE2b-256 |
6ac0cbee96820ca5a79e9ea412fa0c6a48dc5c347c9ed55b14188f1e133e428b
|
File details
Details for the file mlx_bessel-0.1.0-py3-none-any.whl.
File metadata
- Download URL: mlx_bessel-0.1.0-py3-none-any.whl
- Upload date:
- Size: 9.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
79e3511f3f61ec4956a3e476d817fcaea2d610fbc3e1fbb8fffb74786023b6f0
|
|
| MD5 |
f5ce3293707963228cd20c17deb5f61a
|
|
| BLAKE2b-256 |
286607c54c5b4d1474039b6eb649e4ce8cbb5f512ffece97b6feddf2869cbd55
|