Package to sample points in the decision boundary.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

Project description

Decision Boundary Sampler (DBS)

DBSampler is a package to sample points in the decision boundary of classification problems (binary or multiclass). It is theorically exact and efficient for very high dimensions. The guarentees:

Returns a sample of points uniformly distributed in the decision boundary.
Number of points is user defined. More points for a denser sample, less for a faster run.
The points are guarenteed to come from the edges of the condensed Voronoi Diagram (more below).

Installation

Dependencies:

Numpy
Scipy
Sklearn

DBSampler is available on PyPI,

pip install dbsampler

Usage

import dbsampler
cover = dbsampler.DBS(X=X,y=y,n_points=1000,n_epochs=5, distribution='uniform', metric='euclidean')

Parameters:

X: numpy array of shape (samples,features) with the points of every class.
y: 1-dimensional numpy array with labels of each points. Array must be flattened.
n_points: This determines the number of points sampled from the decision boundary. More points equates for a denser sample but slows the algorithm. Default is 1000.
n_epochs: This determines the number of epochs to be used. It is an iterative algorithm but it is very fast to converge. Default is 5. Currently working on a proof for an upper bound on the number of necessary iterations.
distribution: Initial point distribution, it is also the distribution of the points in the decision boundary. Currently supports only uniform (default).
metric: metric used to compute the nearest neighbours. Currently only supports euclidean.

Returns:

cover: numpy array (n_points, n_features) of points in the decision boundary.

How does it work?

For an in-depth explanation look at this post or at our paper. The algorithm aims at sampling uniformly points from the edges of Voronoi Cells belonging to points of different classes. The union of these edges is the decision boundary that maximizes the distance between classes.

It starts by building an initial uniform sample of the space containing n_points. It then iterativelly "pushes" each point to the hyperplane orthogonal to the one between its closest neighbors of different classes.

Sketch of proof of convergence. At each iteration in n_epochs:

If both nearest neighbours have adjacent Voronoi Cells then, after projection the point is in the decision boundary (by construction).
Else then there must exist a point from class A (or not A) that is the new nearest neighbour (by definition of Voronoi Cells).

Performance

The bottleneck of the algorithm is the calculation of a orthogonal hyperplane for each point at each iteration. For low dimensions (<200) we use the null space of a matrix. For higher dimensions we approximate it using QR-Decomposition. The average time complexity of the algorithm running k epochs with n points in dimension d is .

Citation

If you use DBSampler in your work or parts of the algorithm please consider citing:

@inproceedings{petri2020on,
               title={On The Topological Expressive Power of Neural Networks},
               author={Giovanni Petri and Ant{\'o}nio Leit{\~a}o},
               booktitle={NeurIPS 2020 Workshop on Topological Data Analysis and Beyond},
               year={2020},
               url={https://openreview.net/forum?id=I44kJPuvqPD}
}

In the paper above you can find the pseudocode of the algorithm along with the proof of convergence. A complete paper about the method is coming soon.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

Release history Release notifications | RSS feed

0.2.2

Feb 18, 2024

0.2.0

Jan 26, 2024

This version

0.0.1

Dec 9, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbsampler-0.0.1.tar.gz (4.2 kB view hashes)

Uploaded Dec 9, 2020 Source

Built Distribution

dbsampler-0.0.1-py3-none-any.whl (4.9 kB view hashes)

Uploaded Dec 9, 2020 Python 3

Hashes for dbsampler-0.0.1.tar.gz

Hashes for dbsampler-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`5a727b6e5af885f30ef3a2edd7b15af31391356c582ea3d7b732ecf1c389ea60`
MD5	`435aa0fe457e5350d11ec1014de5bc70`
BLAKE2b-256	`36dfea1e80fedf142f90a6d216a21af2f6e93f31fc83545e6501fbda5aaad852`

Hashes for dbsampler-0.0.1-py3-none-any.whl

Hashes for dbsampler-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d74442c83db34f98732d1a996ae95b28ab8c849228ec3aa249c8e80ea7921715`
MD5	`9bcd990b2f143d1f3b3e40a8fa3bed77`
BLAKE2b-256	`71c9cfb7c20103ad462b05185f40ccec2e3a7c09d11523f3cb15604574536d63`