Skip to main content

Python function for performing a linear binning that is optimized for sparsity

Project description

Build Status

sparse_linear_binning: linear binning (optimized for sparsity)

Performs a linear binning technique described in Wand and Jones on a regularly-spaced grid in an arbitrary number of dimensions. The asymptotic behavior of this binning technique performs better than so-called simple binning (i.e. as in histograms). Each data point in d-dimensional space must have an associated weight (for equally weighted pointsjust use a weight of 1.0 for each point).

For example, within a (segment of a) 2D grid with corners A, B, C, and D and a 2D data point P with weight wP:

A-----------------------------------B
|        |                          |
|                                   |
|        |                          |
|- - - - P- - - - - - - - - - - - - |
|        |                          |
D-----------------------------------C
  • Assign a weight to corner A of the proportion of area between P and C (times wP)

  • Assign a weight to corner B of the proportion of area between P and D (times wP)

  • Assign a weight to corner C of the proportion of area between P and A (times wP)

  • Assign a weight to corner D of the proportion of area between P and B (times wP)

Note that the under- and overflow bins need to be accounted for when specifying the numbers of grid points in each dimension (grid points act as bin centers). For instance, if you want grid points in steps of 0.1 in a range of [0,1] (i.e. (0, .1, .2, .3, .4, .5, .6, .7, .8, .9, 1)), specify the number of grid points to be 11. Internally, the grid points are stored in a high performance, C++-based hash table (sparsepp). This allows for finer binning in some circumstances because the hash table doesn’t allocates memory for grid points with near-zero weight. To accommodate arbitrary numbers of bins along each dimension, an arbitrary precision numeric library (boost multiprecision) may be used internally and will negatively impact performance. If this degradation in performance is unacceptable, consider reducing the number of grid points in such a way that the product of grid points in all dimensions is less than the numeric maximum of “unsigned long long” on your system. For instance, in 12 dimenisons with each dimension having 51 grid points gives a total of 10920525780002579727993102330411079589912583123907903488 potential grid points at which point the arbitrary precision library must take care of all arithmetic related to determining grid points.

Quickstart

  • pip install sparse_linear_binning

or

Example

This constructs one million random 2D points in the unit square with random weights and constructs a grid of 51 by 51 (can be different along different dimensions) linearly binned “bin centers.” The boundaries of the grid of bin centers are specified by extents and can be thought of as the under- and overflow bins.

from sparse_linear_binning import sparse_linear_binning
import numpy as np

# generate one million random 2D points and weights
# (should take less than a second to bin)
n_samples=1000000
D=2

# coordinates, weights, and extents must be of type "double"
sample_coords = np.random.random(size=(n_samples, D))
sample_weights = np.random.random(size=n_samples)
extents = np.tile([0., 1.], D).reshape((D, 2))
# n_bins must be of type "unsigned long"
n_bins = np.full(D, 51, dtype=np.dtype('uint64'))

coords, weights = sparse_linear_binning(sample_coords, sample_weights,
                                        extents, n_bins)

# check that weights on grid match original weights
print(np.allclose(weights.sum(), sample_weights.sum()))

Dependencies

  • numpy

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparse_linear_binning-1.0.1.tar.gz (966.4 kB view details)

Uploaded Source

File details

Details for the file sparse_linear_binning-1.0.1.tar.gz.

File metadata

File hashes

Hashes for sparse_linear_binning-1.0.1.tar.gz
Algorithm Hash digest
SHA256 e8083a3cdc1e9e974b9ce9bf73c614001cbe32255723d692cfa641e475475a9e
MD5 19b2310ed67ad372e9d2b862799608ae
BLAKE2b-256 91f3c75870a21f14dbc0115b24d18195cf0c2a0cc4601930e3237c2b0b96f44c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page