Skip to main content

An OpenCL implementation of Kernel Density Estimation

Project description

This repository implements Gaussian Kernel Density Estimation using OpenCL to achieve important performance gains.

The Python interface is based on the Scipy's gaussian_kde class, so it should be pretty easy to replace the CPU implementation of gaussian_kde with the OpenCL implementation in this repository gaussian_kde_ocl.

Example Code

import numpy as np
from kde_ocl import gaussian_kde_ocl

# Generate dummy training data (10000 instances of 2D data)
train = np.random.multivariate_normal([0,0], [[1,0],[0,1]], 10000)
# Generate dummy test data (10000 instances of 2D data)
test = np.random.multivariate_normal([0,0], [[1,0],[0,1]], 100)

# Train the KDE model
kde = gaussian_kde_ocl(train)

# Get the pdf of each test point. This is equivalent to kde.pdf(test)
pdf = kde(test)

# Get the logpdf of each test point. This is equivalent to kde.pdf(test)
logpdf = kde.logpdf(test)

The interface is mostly the same as Scipy's gaussian_kde, but the axis order is changed. For example, training a Scipy's gaussian_kde with a numpy array of shape (10000, 2) is interpreted as two instances of 10000 dimensions. In gaussian_kde_ocl, this data is interpreted as 10000 instances of 2 dimensions. This change makes easier to work with pandas dataframes:

import pandas as pd
import numpy as np
from kde_ocl import gaussian_kde_ocl

# Create pandas dataframe 
a = np.random.normal(0, 1, 5000)
b = np.random.normal(3.2, np.sqrt(1.8), 5000)
data = pd.DataFrame({'a': a, 'b': b})

# Train KDE model
kde = gaussian_kde_ocl(data.values)

# Evaluate one point
logpdf = kde.logpdf([1.1, 2.3])

Performance

This is a comparison of the gaussian_kde_ocl and Scipy's gaussian_kde with 2D data and the following configuration:

  • CPU: Intel i7-6700K.
  • GPU: AMD RX 460.
  • Python 3.7.3
  • Ubuntu 16.04

pdf() method

Training instances / Test instances gaussian_kde_ocl.pdf() gaussian_kde.pdf() Speedup
100,000 / 1,000 218.6474 ± 1.5901 ms 1,911.0764 ± 50.8762 ms 8.74x
1,000 / 10,000,000 18.8643 ± 0.07322 s 237.3429 ± 1.1765 s 12.58x
100 / 10,000 4.4533 ± 0.7297 ms 18.0684 ± 0.3302 ms 4.46x

logpdf() method

Training instances / Test instances gaussian_kde_ocl.logpdf() gaussian_kde.logpdf() Speedup
100,000 / 1,000 261.1466 ± 6.3932 ms 6,798.4730 ± 420.2878 ms 26.03x
1,000 / 10,000,000 36.3143 ± 0.02916 s MemoryError NA
100 / 10,000 8.827 ± 0.7442 ms 34.1114 ± 1.3060 ms 3.86x

Current Limitations

  • Only C order (the default) numpy arrays can be used as traning/test datasets.
  • Only Gaussian kernels are implemented.
  • OpenCL device is selected automatically.

Dependencies

The library is Python 2/3 compatible. Currently, is tested in Ubuntu 16.04, but should be compatible with other operating systems where there are OpenCL GPU support.

Python Dependencies

The project has the following Python dependencies:

cffi numpy six

You can install them with:

pip install cffi numpy six

Rust

The Rust compiler must be installed in the system. Check out https://www.rust-lang.org/tools/install for more information.

The default Rust toolchain is used to compile the library, so make sure to install a Rust toolchain (32 vs 64 bits) compatible with the Python interpreter version (32 vs 64 bits).

OpenCL

The GPU drivers that enable OpenCL should be installed.

Installation

Use pip:

pip install kde_ocl

Alternatively, clone the repository and use the setup script:

python setup.py install

Testing

Tests are run using pytest and requires scipy to compare gaussian_kde_ocl with Scipy's gaussian_kde. Install them:

pip pytest scipy

Run the tests with:

pytest

Benchmarks

To run the benchmarks, pytest-benchmark is needed:

pip pytest-benchmark

Then, execute the tests with benchmarks enabled:

pytest --times

To run only the OpenCL benchmarks:

pytest --times-ocl

To run only the Scipy's gaussian_kde benchmarks:

pytest --times-scipy

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kde_ocl-0.1.0.tar.gz (18.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kde_ocl-0.1.0-py2.py3-none-win_amd64.whl (254.5 kB view details)

Uploaded Python 2Python 3Windows x86-64

File details

Details for the file kde_ocl-0.1.0.tar.gz.

File metadata

  • Download URL: kde_ocl-0.1.0.tar.gz
  • Upload date:
  • Size: 18.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.3

File hashes

Hashes for kde_ocl-0.1.0.tar.gz
Algorithm Hash digest
SHA256 54284e78b5a2cb1e09d2372bae19174c4a7542573e2883ef2fec90db8a81fb79
MD5 f74c9c61cc360fa796a47f7a90858f78
BLAKE2b-256 bb7b8d8108063010ce5e8db7f4b61b403a4cd0bb16bb2fc2d75d1b13c129d3e6

See more details on using hashes here.

File details

Details for the file kde_ocl-0.1.0-py2.py3-none-win_amd64.whl.

File metadata

  • Download URL: kde_ocl-0.1.0-py2.py3-none-win_amd64.whl
  • Upload date:
  • Size: 254.5 kB
  • Tags: Python 2, Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.3

File hashes

Hashes for kde_ocl-0.1.0-py2.py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 3cbf7ee5ff399fcd073caa5a15df75c209bbbba0c428dfe1e59d9e33c25f92b2
MD5 1ec076a4d422e4a443ce5e45e163dc00
BLAKE2b-256 ec79877f22e170e1667ade263e64294849524df5b5b58a31f3e292d80055c5e0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page