An OpenCL implementation of Kernel Density Estimation
Project description
This repository implements Gaussian Kernel Density Estimation using OpenCL to achieve important performance gains.
The Python interface is based on the Scipy's gaussian_kde class,
so it should be pretty easy to replace the CPU implementation of gaussian_kde with the
OpenCL implementation in this repository gaussian_kde_ocl.
Example Code
import numpy as np
from kde_ocl import gaussian_kde_ocl
# Generate dummy training data (10000 instances of 2D data)
train = np.random.multivariate_normal([0,0], [[1,0],[0,1]], 10000)
# Generate dummy test data (10000 instances of 2D data)
test = np.random.multivariate_normal([0,0], [[1,0],[0,1]], 100)
# Train the KDE model
kde = gaussian_kde_ocl(train)
# Get the pdf of each test point. This is equivalent to kde.pdf(test)
pdf = kde(test)
# Get the logpdf of each test point. This is equivalent to kde.pdf(test)
logpdf = kde.logpdf(test)
The interface is mostly the same as Scipy's gaussian_kde, but the axis order is changed. For example, training a
Scipy's gaussian_kde with a numpy array of shape (10000, 2) is interpreted as two instances of 10000 dimensions. In
gaussian_kde_ocl, this data is interpreted as 10000 instances of 2 dimensions. This change makes easier to work with
pandas dataframes:
import pandas as pd
import numpy as np
from kde_ocl import gaussian_kde_ocl
# Create pandas dataframe
a = np.random.normal(0, 1, 5000)
b = np.random.normal(3.2, np.sqrt(1.8), 5000)
data = pd.DataFrame({'a': a, 'b': b})
# Train KDE model
kde = gaussian_kde_ocl(data.values)
# Evaluate one point
logpdf = kde.logpdf([1.1, 2.3])
Performance
This is a comparison of the gaussian_kde_ocl and Scipy's gaussian_kde with 2D data and the following configuration:
- CPU: Intel i7-6700K.
- GPU: AMD RX 460.
- Python 3.7.3
- Ubuntu 16.04
pdf() method
| Training instances / Test instances | gaussian_kde_ocl.pdf() |
gaussian_kde.pdf() |
Speedup |
|---|---|---|---|
| 100,000 / 1,000 | 218.6474 ± 1.5901 ms | 1,911.0764 ± 50.8762 ms | 8.74x |
| 1,000 / 10,000,000 | 18.8643 ± 0.07322 s | 237.3429 ± 1.1765 s | 12.58x |
| 100 / 10,000 | 4.4533 ± 0.7297 ms | 18.0684 ± 0.3302 ms | 4.46x |
logpdf() method
| Training instances / Test instances | gaussian_kde_ocl.logpdf() |
gaussian_kde.logpdf() |
Speedup |
|---|---|---|---|
| 100,000 / 1,000 | 261.1466 ± 6.3932 ms | 6,798.4730 ± 420.2878 ms | 26.03x |
| 1,000 / 10,000,000 | 36.3143 ± 0.02916 s | MemoryError | NA |
| 100 / 10,000 | 8.827 ± 0.7442 ms | 34.1114 ± 1.3060 ms | 3.86x |
Current Limitations
- Only C order (the default) numpy arrays can be used as traning/test datasets.
- Only Gaussian kernels are implemented.
- OpenCL device is selected automatically.
Dependencies
The library is Python 2/3 compatible. Currently, is tested in Ubuntu 16.04, but should be compatible with other operating systems where there are OpenCL GPU support.
Python Dependencies
The project has the following Python dependencies:
cffi numpy six
You can install them with:
pip install cffi numpy six
Rust
The Rust compiler must be installed in the system. Check out https://www.rust-lang.org/tools/install for more information.
The default Rust toolchain is used to compile the library, so make sure to install a Rust toolchain (32 vs 64 bits) compatible with the Python interpreter version (32 vs 64 bits).
OpenCL
The GPU drivers that enable OpenCL should be installed.
Installation
Use pip:
pip install kde_ocl
Alternatively, clone the repository and use the setup script:
python setup.py install
Testing
Tests are run using pytest and requires scipy to compare gaussian_kde_ocl with Scipy's gaussian_kde. Install them:
pip pytest scipy
Run the tests with:
pytest
Benchmarks
To run the benchmarks, pytest-benchmark is needed:
pip pytest-benchmark
Then, execute the tests with benchmarks enabled:
pytest --times
To run only the OpenCL benchmarks:
pytest --times-ocl
To run only the Scipy's gaussian_kde benchmarks:
pytest --times-scipy
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kde_ocl-0.1.0.tar.gz.
File metadata
- Download URL: kde_ocl-0.1.0.tar.gz
- Upload date:
- Size: 18.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
54284e78b5a2cb1e09d2372bae19174c4a7542573e2883ef2fec90db8a81fb79
|
|
| MD5 |
f74c9c61cc360fa796a47f7a90858f78
|
|
| BLAKE2b-256 |
bb7b8d8108063010ce5e8db7f4b61b403a4cd0bb16bb2fc2d75d1b13c129d3e6
|
File details
Details for the file kde_ocl-0.1.0-py2.py3-none-win_amd64.whl.
File metadata
- Download URL: kde_ocl-0.1.0-py2.py3-none-win_amd64.whl
- Upload date:
- Size: 254.5 kB
- Tags: Python 2, Python 3, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3cbf7ee5ff399fcd073caa5a15df75c209bbbba0c428dfe1e59d9e33c25f92b2
|
|
| MD5 |
1ec076a4d422e4a443ce5e45e163dc00
|
|
| BLAKE2b-256 |
ec79877f22e170e1667ade263e64294849524df5b5b58a31f3e292d80055c5e0
|