Skip to main content

Decision Boundary Sampler

Project description

Decision Boundary Sampler (DBS)

Downloads PyPI license Maintenance version made-with-rust

Contents

DBSampler is a package to sample points in the decision boundary of classification problems (binary or multiclass). It is theorically exact and efficient for very high dimensions. The guarentees:

  • Returns a sample of points uniformly distributed in the decision boundary.
  • Number of points is user defined. More points for a denser sample, less for a faster run.
  • The points are guarenteed to come from the edges of the condensed Voronoi Diagram (more below).

Installation

Pre-built packages currently for MacOS, Windows and Linux systems are available in PyPI and can be installed with:

pip install dbsampler

On uncommon architectures, you may need to first install Cargo before running pip install vlmc.

Compilation from source

In order to compile from source you will need to install Rust/Cargo and maturin for the python bindings. Maturin is best used within a Python virtual environment:

# activate your desired virtual environment first, then:
pip install maturin
git clone https://github.com/antonio-leitao/dbsampler.git
cd vlmc
# build and install the package:
maturin develop --release

Usage

import dbsampler
cover = dbsampler.dbs(data=X,labels=y,n_points=1000,n_epochs=5, sparse=True, parallel=True) 

Parameters:

  • data: numpy array of shape (samples,features) with the points of every class.
  • labels: 1-dimensional numpy array with labels of each points. Array must be flattened.
  • n_points: This determines the number of points sampled from the decision boundary. More points equates for a denser sample but slows the algorithm. Default is 1000.
  • sparse: boolean (default True), whether to remove points that are in the same Voronoi Edge or not.
  • parallel: boolean (default True)

Returns:

  • cover: numpy array (n_points, n_features) of points in the decision boundary.

Sparse

Passing the sparse flag will remove the cover points that fall on the same Voronoi Edge, favoring the first one. This can drastically reduce the number of points while maintaining a uniform and complete cover of the decision boundary. Below is the example of 5000 points sampled (left) and the same points with sparse=True.

Performance

DBSampler is written in Rust pre-builds the binaries for Windows, MacOS and most Linux distributions. DBSampler achieves very high performance due to effective parallization and BLAS support. Currently manages to calculate a cover of 5 000 points given 10 000 points in 500 dimensions in less than 10 seconds.

More improvments are planned targeted situations where the number of samples times the dimensions is higher than 1 billion where the current implmentations starts to slow down.

How does it work?

For an in-depth explanation check at our paper. The algorithm aims at sampling uniformly points from the edges of Voronoi Cells belonging to points of different classes. The union of these edges is the decision boundary that maximizes the distance between classes.

It starts by building an initial uniform sample of the space containing n_points. It then iterativelly "pushes" each point to the hyperplane orthogonal to the one between its closest neighbors of different classes.

Sketch of proof of convergence. At each iteration in n_epochs:

  1. If both nearest neighbours have adjacent Voronoi Cells then, after projection the point is in the decision boundary (by construction).
  2. Else then there must exist a point from class A (or not A) that is the new nearest neighbour (by definition of Voronoi Cells).

Citing

If you use DBSampler in your work or parts of the algorithm please consider citing:

@inproceedings{petri2020on,
               title={On The Topological Expressive Power of Neural Networks},
               author={Giovanni Petri and Ant{\'o}nio Leit{\~a}o},
               booktitle={NeurIPS 2020 Workshop on Topological Data Analysis and Beyond},
               year={2020},
               url={https://openreview.net/forum?id=I44kJPuvqPD}
}

In the paper above you can find the pseudocode of the algorithm along with the proof of convergence. A complete paper about the method is coming soon.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbsampler-0.2.2.tar.gz (1.4 MB view hashes)

Uploaded Source

Built Distributions

dbsampler-0.2.2-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

dbsampler-0.2.2-pp310-pypy310_pp73-manylinux_2_12_i686.manylinux2010_i686.whl (1.1 MB view hashes)

Uploaded PyPy manylinux: glibc 2.12+ i686

dbsampler-0.2.2-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

dbsampler-0.2.2-pp39-pypy39_pp73-manylinux_2_17_s390x.manylinux2014_s390x.whl (1.2 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ s390x

dbsampler-0.2.2-pp39-pypy39_pp73-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (1.2 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ppc64le

dbsampler-0.2.2-pp39-pypy39_pp73-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (1.1 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARMv7l

dbsampler-0.2.2-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.1 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARM64

dbsampler-0.2.2-pp39-pypy39_pp73-manylinux_2_12_i686.manylinux2010_i686.whl (1.1 MB view hashes)

Uploaded PyPy manylinux: glibc 2.12+ i686

dbsampler-0.2.2-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

dbsampler-0.2.2-pp38-pypy38_pp73-manylinux_2_17_s390x.manylinux2014_s390x.whl (1.2 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ s390x

dbsampler-0.2.2-pp38-pypy38_pp73-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (1.2 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ppc64le

dbsampler-0.2.2-pp38-pypy38_pp73-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (1.1 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARMv7l

dbsampler-0.2.2-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.1 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARM64

dbsampler-0.2.2-pp38-pypy38_pp73-manylinux_2_12_i686.manylinux2010_i686.whl (1.1 MB view hashes)

Uploaded PyPy manylinux: glibc 2.12+ i686

dbsampler-0.2.2-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

dbsampler-0.2.2-pp37-pypy37_pp73-manylinux_2_17_s390x.manylinux2014_s390x.whl (1.2 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ s390x

dbsampler-0.2.2-pp37-pypy37_pp73-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (1.2 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ppc64le

dbsampler-0.2.2-pp37-pypy37_pp73-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (1.1 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARMv7l

dbsampler-0.2.2-pp37-pypy37_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.1 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARM64

dbsampler-0.2.2-pp37-pypy37_pp73-manylinux_2_12_i686.manylinux2010_i686.whl (1.1 MB view hashes)

Uploaded PyPy manylinux: glibc 2.12+ i686

dbsampler-0.2.2-cp312-none-win_amd64.whl (173.0 kB view hashes)

Uploaded CPython 3.12 Windows x86-64

dbsampler-0.2.2-cp312-none-win32.whl (163.9 kB view hashes)

Uploaded CPython 3.12 Windows x86

dbsampler-0.2.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

dbsampler-0.2.2-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl (1.2 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ s390x

dbsampler-0.2.2-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (1.2 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ ppc64le

dbsampler-0.2.2-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.1 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ ARM64

dbsampler-0.2.2-cp312-cp312-manylinux_2_12_i686.manylinux2010_i686.whl (1.1 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.12+ i686

dbsampler-0.2.2-cp312-cp312-macosx_11_0_arm64.whl (295.4 kB view hashes)

Uploaded CPython 3.12 macOS 11.0+ ARM64

dbsampler-0.2.2-cp312-cp312-macosx_10_12_x86_64.whl (300.1 kB view hashes)

Uploaded CPython 3.12 macOS 10.12+ x86-64

dbsampler-0.2.2-cp311-none-win_amd64.whl (173.1 kB view hashes)

Uploaded CPython 3.11 Windows x86-64

dbsampler-0.2.2-cp311-none-win32.whl (164.2 kB view hashes)

Uploaded CPython 3.11 Windows x86

dbsampler-0.2.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

dbsampler-0.2.2-cp311-cp311-manylinux_2_17_s390x.manylinux2014_s390x.whl (1.2 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ s390x

dbsampler-0.2.2-cp311-cp311-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (1.2 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ppc64le

dbsampler-0.2.2-cp311-cp311-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (1.1 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARMv7l

dbsampler-0.2.2-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.1 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARM64

dbsampler-0.2.2-cp311-cp311-manylinux_2_12_i686.manylinux2010_i686.whl (1.1 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.12+ i686

dbsampler-0.2.2-cp311-cp311-macosx_11_0_arm64.whl (295.4 kB view hashes)

Uploaded CPython 3.11 macOS 11.0+ ARM64

dbsampler-0.2.2-cp311-cp311-macosx_10_12_x86_64.whl (300.2 kB view hashes)

Uploaded CPython 3.11 macOS 10.12+ x86-64

dbsampler-0.2.2-cp310-none-win_amd64.whl (173.0 kB view hashes)

Uploaded CPython 3.10 Windows x86-64

dbsampler-0.2.2-cp310-none-win32.whl (164.2 kB view hashes)

Uploaded CPython 3.10 Windows x86

dbsampler-0.2.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

dbsampler-0.2.2-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl (1.2 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ s390x

dbsampler-0.2.2-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (1.2 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ppc64le

dbsampler-0.2.2-cp310-cp310-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (1.1 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARMv7l

dbsampler-0.2.2-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.1 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

dbsampler-0.2.2-cp310-cp310-manylinux_2_12_i686.manylinux2010_i686.whl (1.1 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.12+ i686

dbsampler-0.2.2-cp310-cp310-macosx_11_0_arm64.whl (295.4 kB view hashes)

Uploaded CPython 3.10 macOS 11.0+ ARM64

dbsampler-0.2.2-cp310-cp310-macosx_10_12_x86_64.whl (300.2 kB view hashes)

Uploaded CPython 3.10 macOS 10.12+ x86-64

dbsampler-0.2.2-cp39-none-win_amd64.whl (173.0 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

dbsampler-0.2.2-cp39-none-win32.whl (164.3 kB view hashes)

Uploaded CPython 3.9 Windows x86

dbsampler-0.2.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

dbsampler-0.2.2-cp39-cp39-manylinux_2_17_s390x.manylinux2014_s390x.whl (1.2 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ s390x

dbsampler-0.2.2-cp39-cp39-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (1.2 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ppc64le

dbsampler-0.2.2-cp39-cp39-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (1.1 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARMv7l

dbsampler-0.2.2-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.1 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

dbsampler-0.2.2-cp39-cp39-manylinux_2_12_i686.manylinux2010_i686.whl (1.1 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.12+ i686

dbsampler-0.2.2-cp38-none-win_amd64.whl (173.0 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

dbsampler-0.2.2-cp38-none-win32.whl (163.9 kB view hashes)

Uploaded CPython 3.8 Windows x86

dbsampler-0.2.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

dbsampler-0.2.2-cp38-cp38-manylinux_2_17_s390x.manylinux2014_s390x.whl (1.2 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ s390x

dbsampler-0.2.2-cp38-cp38-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (1.2 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ppc64le

dbsampler-0.2.2-cp38-cp38-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (1.1 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ARMv7l

dbsampler-0.2.2-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.1 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ARM64

dbsampler-0.2.2-cp38-cp38-manylinux_2_12_i686.manylinux2010_i686.whl (1.1 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.12+ i686

dbsampler-0.2.2-cp37-none-win_amd64.whl (173.0 kB view hashes)

Uploaded CPython 3.7 Windows x86-64

dbsampler-0.2.2-cp37-none-win32.whl (163.8 kB view hashes)

Uploaded CPython 3.7 Windows x86

dbsampler-0.2.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

dbsampler-0.2.2-cp37-cp37m-manylinux_2_17_s390x.manylinux2014_s390x.whl (1.2 MB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ s390x

dbsampler-0.2.2-cp37-cp37m-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (1.2 MB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ ppc64le

dbsampler-0.2.2-cp37-cp37m-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (1.1 MB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ ARMv7l

dbsampler-0.2.2-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.1 MB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ ARM64

dbsampler-0.2.2-cp37-cp37m-manylinux_2_12_i686.manylinux2010_i686.whl (1.1 MB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.12+ i686

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page