Efficient repeated evaluation of splines at fixed points.
Project description
cached_interpolate
Efficient evaluation of interpolants at fixed points.
When performing a Bayesian analysis with a stochastic sampler however, one sometimes has to evaluate many interpolants (representing the parameterized model) at the same set of points (the data) with the same knot points.
Evaluating interpolants typically requires two stages:
- finding the closest knot of the interpolant to the new point and the distance from that knot.
- evaluating the interpolant at that point.
When we have identical knots and evaluation points but different functions being approximated the first of these stages is done many times unnecessarily. This can be made more efficient by caching the locations of the evaluation points leaving just the evaluation of the interpolation coefficients to be done at each iteration.
A further advantage of this, is that it allows trivially parallising the interpolation using JAX
or cupy
.
This package implements this caching for nearest neighbour, linear, and cubic interpolation.
Installation
Currently this is only installable by downloading the source and installing locally.
$ git clone git@github.com:ColmTalbot/cached_interpolation.git
$ cd cached_interpolation
$ pip install .
Demonstration
We can compare the interpolation to the scipy.interpolate.CubicSpline
implementation.
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import CubicSpline
from cached_interpolate import CachingInterpolant
x_nodes = np.linspace(0, 1, 10)
y_nodes = np.random.uniform(-1, 1, 10)
evaluation_points = np.sort(np.random.uniform(0, 1, 10000))
interpolant = CachingInterpolant(x=x_nodes, y=y_nodes, kind="cubic")
sp_interpolant = CubicSpline(x=x_nodes, y=y_nodes, bc_type="natural")
figure, axes = plt.subplots(nrows=2, gridspec_kw={'height_ratios': [3, 1]}, sharex=True)
axes[0].plot(evaluation_points, interpolant(evaluation_points))
axes[0].plot(evaluation_points, sp_interpolant(evaluation_points))
axes[0].scatter(x_nodes, y_nodes, color="k")
axes[1].plot(evaluation_points, interpolant(evaluation_points) - sp_interpolant(evaluation_points))
axes[0].set_ylabel("$y$")
axes[1].set_xlabel("$x$")
axes[1].set_ylabel("$\\Delta y$")
axes[1].set_xlim(0, 1)
plt.tight_layout()
plt.show()
plt.close(figure)
I note here that we use the "natural" boundary condition.
This means that first and second derivatives of the spline vanish at the endpoints.
This is different from the default "not-a-knot" boundary condition used in scipy
.
We can now evaluate this interpolant in a loop to demonstrate the performance
of the caching.
At every iteration we change the function being interpolated by randomly setting the
y
argument.
Before the loop, we add an initial call to set up the cache as this will be much
slower than the following iterations.
import time
import numpy as np
from scipy.interpolate import CubicSpline
from cached_interpolate import CachingInterpolant
x_nodes = np.linspace(0, 1, 10)
y_nodes = np.random.uniform(-1, 1, 10)
evaluation_points = np.random.uniform(0, 1, 10000)
interpolant = CachingInterpolant(x=x_nodes, y=y_nodes, kind="cubic")
sp_interpolant = CubicSpline(x=x_nodes, y=y_nodes, bc_type="natural")
interpolant(x=evaluation_points, y=y_nodes)
n_iterations = 1000
start = time.time()
for _ in range(n_iterations):
y_nodes = np.random.uniform(-1, 1, 10)
interpolant(x=evaluation_points, y=y_nodes)
stop = time.time()
print(f"Cached time = {(stop - start):.3f}s for {n_iterations} iterations.")
start = time.time()
for _ in range(n_iterations):
y_nodes = np.random.uniform(-1, 1, 10)
CubicSpline(x=x_nodes, y=y_nodes, bc_type="natural")(evaluation_points)
stop = time.time()
print(f"Scipy time = {(stop - start):.3f}s for {n_iterations} iterations.")
Cached time = 0.187s for 1000 iterations.
Scipy time = 0.450s for 1000 iterations.
We've gained a factor of ~2.5 over the scipy
version without caching.
If this is the dominant cost in a simulation that takes a week to run, this is a big improvement.
If we need to evaluate for a new set of points, we have to tell the interpolant to reset the cache. There are two ways to do this:
- create a new interpolant, this will require reevaluating the interplation coefficients.
- disable the evaluation point caching.
import numpy as np
from scipy.interpolant import CubicSpline
from cached_interpolate import CachingInterpolant
x_nodes = np.linspace(0, 1, 10)
y_nodes = np.random.uniform(-1, 1, 10)
evaluation_points = np.random.uniform(0, 1, 10000)
interpolant = CachingInterpolant(x=x_nodes, y=y_nodes, kind="cubic")
interpolated_values = interpolant(evaluation_points)
new_evaluation_points = np.random.uniform(0, 1, 1000)
interpolant(x=new_evaluation_points, use_cache=False)
Using the code in this way is much slower than scipy
and so not practically very useful.
If you have access to an Nvidia
GPU and are evaluating the spline at ~ O(10^5) or more points you may want to switch
to the cupy
backend.
This uses cupy
just for the evaluation stage, not for computing the interpolation coefficients.
import cupy as cp
import numpy as np
from cached_interpolate import CachingInterpolant
x_nodes = np.linspace(0, 1, 10)
y_nodes = np.random.uniform(-1, 1, 10)
evaluation_points = np.random.uniform(0, 1, 10000)
evaluation_points = cp.asarray(evaluation_points)
interpolant = CachingInterpolant(x=x_nodes, y=y_nodes, backend=cp)
interpolated_values = interpolant(evaluation_points)
We can now repeat our timing test. To make the use of a GPU more realistic we'll increase the number of evaluation points.
import time
import cupy as cp
import numpy as np
from scipy.interpolate import CubicSpline
from cached_interpolate import CachingInterpolant
x_nodes = np.linspace(0, 1, 10)
y_nodes = np.random.uniform(-1, 1, 10)
evaluation_points = np.random.uniform(0, 1, 100000)
cp_evaluation_points = cp.asarray(evaluation_points)
interpolant = CachingInterpolant(x=x_nodes, y=y_nodes, kind="cubic")
cp_interpolant = CachingInterpolant(x=x_nodes, y=y_nodes, kind="cubic", backend=cp)
sp_interpolant = CubicSpline(x=x_nodes, y=y_nodes, bc_type="natural")
interpolant(x=evaluation_points)
cp_interpolant(x=cp_evaluation_points)
n_iterations = 1000
start = time.time()
for _ in range(n_iterations):
y_nodes = np.random.uniform(-1, 1, 10)
interpolant(x=evaluation_points, y=np.random.uniform(-1, 1, 10))
stop = time.time()
print(f"CPU cached time = {(stop - start):.3f}s for {n_iterations} iterations.")
start = time.time()
for _ in range(n_iterations):
y_nodes = np.random.uniform(-1, 1, 10)
CubicSpline(x=x_nodes, y=y_nodes, bc_type="natural")(evaluation_points)
stop = time.time()
print(f"Scipy time = {(stop - start):.3f}s for {n_iterations} iterations.")
start = time.time()
for _ in range(n_iterations):
y_nodes = np.random.uniform(-1, 1, 10)
cp_interpolant(x=cp_evaluation_points, y=np.random.uniform(-1, 1, 10))
stop = time.time()
print(f"GPU cached time = {(stop - start):.3f}s for {n_iterations} iterations.")
CPU cached time = 3.367s for 1000 iterations.
Scipy time = 4.213s for 1000 iterations.
GPU cached time = 0.212s for 1000 iterations.
While there are likely more optimizations that can be made and improved flexibility in the implementation, we can see that the GPU version is well over an order of magnitude faster than either of the CPU versions.
If you have any comments/questions feel free to contact me through the issue tracker or a pull request.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for cached_interpolate-0.3.1-cp312-cp312-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0afc164fc19435e356ca0b99685d632534ddc983adc3c896dfb94d46c9419c22 |
|
MD5 | cd4feb6b34d1777da2036b6d804fff88 |
|
BLAKE2b-256 | 5d40b19f359e567834621ce5beab9e0d5f007ecef608b0ef3a8ea53779359231 |
Hashes for cached_interpolate-0.3.1-cp312-cp312-macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6d46d3ed6c45f773effb57665c7aab0dc47936af20f9a472cf0ad5f23a865582 |
|
MD5 | 0d6fd3eab9a31c4d7e508458266dee9f |
|
BLAKE2b-256 | fad5d54a53b2c194208933609ff32ef43b19a5978a0b1bb3d65dfdd0d8b8e18c |
Hashes for cached_interpolate-0.3.1-cp311-cp311-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3ed6005e182fdec0abbe68e217b61d8de92affb691a21e9b84438c3f3f24016e |
|
MD5 | 19264906937d36e20a0e6ceafff1c210 |
|
BLAKE2b-256 | b7cc2951c04390051030a297099c844cb241299bead4b6f1005271c5112c8932 |
Hashes for cached_interpolate-0.3.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6a6b0df2434f844ed7bd636d9b44589f946a7d07258cd0a96356dc7098b94619 |
|
MD5 | da4928f46f489978d09b325e75c63bee |
|
BLAKE2b-256 | cd0e446a90ee48ad7eadf11a7ba3c29b50e888afcec0b9ee636f7ff5f679cabe |
Hashes for cached_interpolate-0.3.1-cp311-cp311-macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d268a6e80c7ff9354873e399eb017bd4979cc380a157bc587ac271de3653b691 |
|
MD5 | 7fcc4f04af9c84a7afb9ce81beb375c2 |
|
BLAKE2b-256 | 4b3396a8f6aa7a5ff5a2788291757c7c4c232a6598f89bba226d3618949b2584 |
Hashes for cached_interpolate-0.3.1-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6b082304f363d212e4e208b0cdbb1dfa54481c6f41136b6e1df4d9bb4d56da5e |
|
MD5 | ba7da93bde69261cb41058623510738b |
|
BLAKE2b-256 | 1a5b7042c2e4836be92332dc5b04f373a9ba0a93715a8f4aab2b44ed21d0bd7c |
Hashes for cached_interpolate-0.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6eeb5767c5a1745073439f4b6f697687b6dd36a413b684b01ab23cceca2317f6 |
|
MD5 | e3e9d11ceac665bd0b5056f586d3101b |
|
BLAKE2b-256 | 1945b191f8d8c26ce13bdab4374790019d42ca437a0ee495e05a0fb0943dc4b4 |
Hashes for cached_interpolate-0.3.1-cp310-cp310-macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 241be262a02b7ee9203493cc173e28b3ed34e29526684431ca536a73194e77ef |
|
MD5 | efbc65bf79eed2884df3407e4e673cec |
|
BLAKE2b-256 | c0255d465c514f3a43fd378c71e973fed8743030e4db32184581b04038351b5d |