Skip to main content

A python package for out-of-core similarity search and dimensionality reduction

Project description

SIMBSIG = SIMilarity Batched Search Integrated Gpu-based

License: BSD Version PythonVersion Documentation Status

SIMBSIG is a GPU accelerated software tool for neighborhood queries, KMeans and PCA which mimics the sklearn API.

The algorithm for batchwise data loading and GPU usage follows the principle of [1]. The algorithm for KMeans follows the Mini-batch KMeans described by Scully [2]. The PCA algorithm follows Halko's method [3]. The API matches sklearn in big parts [4,5], such that code dedicated to sklearn can be simply reused by importing SIMBSIG instead of sklearn. Additional features and arguments for scaling have been added, for example all data input can be either array-like or as a h5py file handle [6].

Eljas Röllin, Michael Adamer, Lucie Bourguignon, Karsten M. Borgwardt

Installation

SIMBSIG is a PyPI package which can be installed via pip:

pip install simbsig

You can also clone the repository and install it locally via Poetry by executing

poetry install

in the repository directory.

Example

>>> X = [[0,1], [1,2], [2,3], [3,4]]
>>> y = [0, 0, 1, 1]
>>> from simbsig import KNeighborsClassifier
>>> knn_classifier = KNeighborsClassifier(n_neighbors=3)
>>> knn_classifier.fit(X, y)
KNeighborsClassifier(...)
>>> print(knn_classifier.predict([[0.9, 1.9]]))
[0]
>>> print(knn_classifier.predict_proba([[0.9]]))
[[0.666... 0.333...]]

Tutorials

Tutorial notebooks with toy examples can be found under tutorials

Documentation

The documentation can be found here.

Overview of implemented algorithms

Class SIMBSIG sklearn
NearestNeighbors fit fit
kneighbors kneighbors
radius_neighbors radius_neighbors
KNeighborsClassifier fit fit
predict predict
predict_proba predict_proba
KNeighborsRegressor fit fit
predict predict
RadiusNeighborsClassifier fit fit
predict predict
predict_proba predict_proba
RadiusNeighborsRegressor fit fit
predict predict
KMeans fit fit
predict predict
fit_predict fit_predict
PCA fit fit
transform transform
fit_transform fit_transform

Contact

This code is developed and maintained by members of the Department of Biosystems Science and Engineering at ETH Zurich. It available from the GitHub repo of the Machine Learning and Computational Biology Lab of Prof. Dr. Karsten Borgwardt.

References:

[1] Gutiérrez, P. D., Lastra, M., Bacardit, J., Benítez, J. M., & Herrera, F. (2016). GPU-SME-kNN: Scalable and memory efficient kNN and lazy learning using GPUs. Information Sciences, 373, 165-182.

[2] Sculley, D. (2010, April). Web-scale k-means clustering. In Proceedings of the 19th international conference on World wide web (pp. 1177-1178).

[3] Halko, N., Martinsson, P. G., Shkolnisky, Y., & Tygert, M. (2011). An algorithm for the principal component analysis of large data sets. SIAM Journal on Scientific computing, 33(5), 2580-2594.

[4] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. the Journal of machine Learning research, 12, 2825-2830.

[5] Buitinck, L., Louppe, G., Blondel, M., Pedregosa, F., Mueller, A., Grisel, O., ... & Varoquaux, G. (2013). API design for machine learning software: experiences from the scikit-learn project. arXiv preprint arXiv:1309.0238.

[6] Collette, A., Kluyver, T., Caswell, T. A., Tocknell, J., Kieffer, J., Scopatz, A., ... & Hole, L. (2021). h5py/h5py: 3.1. 0. Zenodo.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simbsig-0.1.2.tar.gz (42.7 kB view details)

Uploaded Source

Built Distribution

simbsig-0.1.2-py3-none-any.whl (36.7 kB view details)

Uploaded Python 3

File details

Details for the file simbsig-0.1.2.tar.gz.

File metadata

  • Download URL: simbsig-0.1.2.tar.gz
  • Upload date:
  • Size: 42.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.8.10 Linux/5.13.0-51-generic

File hashes

Hashes for simbsig-0.1.2.tar.gz
Algorithm Hash digest
SHA256 c73c91762e42783669f5c16b758ca45807ef6ee6b589704fc78cb930da697973
MD5 50ab6fcad4c74b9d40b78c24c1f85e36
BLAKE2b-256 7f294206fc0eda929a08dc78a1ed199015513ed2031836f53b54cbfddcc4f22f

See more details on using hashes here.

File details

Details for the file simbsig-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: simbsig-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 36.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.8.10 Linux/5.13.0-51-generic

File hashes

Hashes for simbsig-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ead5a2bb226da27864071007b7ef3eff412afcf8d248dc9af0d69a1ae5dfa8fb
MD5 d18737fcc1801df223e2959f8e7d1afc
BLAKE2b-256 e9767542de5dd78ac91d3d188afa79d019bb2c9cdb8ccced9f3ab1fde6f28b47

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page