Skip to main content

Python function for performing a linear binning

Project description

Build Status

linear_binning: linear binning

Performs a linear binning technique described in Wand and Jones on a regularly-spaced grid in an arbitrary number of dimensions. The asymptotic behavior of this binning technique performs better than so-called simple binning (i.e. as in histograms). Each data point in d-dimensional space must have an associated weight (for equally weighted points just use a weight of 1.0 for each point).

For example, within a (segment of a) 2D grid with corners A, B, C, and D and a 2D data point P with weight wP:

A-----------------------------------B
|        |                          |
|                                   |
|        |                          |
|- - - - P- - - - - - - - - - - - - |
|        |                          |
D-----------------------------------C
  • Assign a weight to corner A of the proportion of area between P and C (times wP)

  • Assign a weight to corner B of the proportion of area between P and D (times wP)

  • Assign a weight to corner C of the proportion of area between P and A (times wP)

  • Assign a weight to corner D of the proportion of area between P and B (times wP)

Note that the under- and overflow bins need to be accounted for when specifying the numbers of grid points in each dimension (grid points act as bin centers). For instance, if you want grid points in steps of 0.1 in a range of [0,1] (i.e. (0, .1, .2, .3, .4, .5, .6, .7, .8, .9, 1)), specify the number of grid points to be 11. Internally, the grid points are stored in a contiguous array. This sets a natural restriction on the number of grid points along each dimension as memory is allocated for each potential grid point even if it is never used. To accommodate arbitrary numbers of dimension, an arbitrary precision numeric library (boost multiprecision) may be used internally and will negatively impact performance. If this degradation in performance is unacceptable, consider reducing the number of dimensions in such a way that the is less than the maximum number of binary digits in an “unsigned long long” on your system.

Quickstart

  • pip install linear_binning

or

Example

This constructs one million random 2D points in the unit square with random weights and constructs a grid of 51 by 51 (can be different along different dimensions) linearly binned “bin centers.” The boundaries of the grid of bin centers are specified by extents and can be thought of as the under- and overflow bins (i.e. these are the coordinates of the first and last bin centers).

from linear_binning import linear_binning
import numpy as np

# generate one million random 2D points and weights
# (should take less than a second to bin)
n_samples=1000000
D=2

# coordinates, weights, and extents must be of type "double"
sample_coords = np.random.random(size=(n_samples, D))
sample_weights = np.random.random(size=n_samples)
extents = np.tile([0., 1.], D).reshape((D, 2))
n_bins = np.full(D, 51)

coords, weights = linear_binning(sample_coords, sample_weights,
                                 extents, n_bins)

# check that weights on grid match original weights
print(np.allclose(weights.sum(), sample_weights.sum()))

Dependencies

  • numpy

1.1.0

Added support for automatically converting Numpy arrays to proper d-types. Added more explicit tests.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

linear_binning-1.1.0.tar.gz (883.2 kB view details)

Uploaded Source

File details

Details for the file linear_binning-1.1.0.tar.gz.

File metadata

File hashes

Hashes for linear_binning-1.1.0.tar.gz
Algorithm Hash digest
SHA256 d32500caf42912ea0d6f52c087158bfadd251e6467746715ee3bdb2fa529d991
MD5 dbcc8cb320b18f0a38ecd99e030d61b2
BLAKE2b-256 0bed46418fd27bf051dccf35da6816b3c04ecfb9f2a91eb4af4afa2332fc7d50

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page