Skip to main content

Sparse Computation algorithm for sparsifying similarity matrices.

Project description

Travis build status

Sparse Computation

sparsecomputation is a python package to construct a sparse matrix of pairwise similarities (similarity matrix). Sparse similarity matrices generated with sparse computation provide a substantial speedup without loss in accuracy for supervised machine learning algorithms that rely on pairwise similarity matrices / kernel matrices, such as k-nearest neighbors, kernel-based support vector machines, and supervised normalized cut.

Similarity matrices have O(n^2) entries for a dataset with n objects (observations). Computing all pairwise similarities is computationally intractable for large datasets. Sparse computation overcomes this computational burden by identifying all pairs of objects that are close to each other in space. These pairs are identified by projecting the objects onto a low-dimensional space and using closeness in the low-dimensional space as a proxy for closeness in the original space. Sparse computation uses grids to determine efficiently close objects in the low-dimensional space.

Sparse computation takes as input a two-dimensional numpy array of size n x d. Each row represents one object with d features. Spase computation returns a list of 2-tuples, where each tuple are the indices of the objects between which a pairwise similarity should be computed.

Installation

You can use pip to install the package:

pip instal sparsecomputation

Minimal Example

To find the relevant pairwise similarities with Sparse Computation, use the following code:

# load sample dataset
import sklearn.datasets
data, _ = sklearn.datasets.load_iris(return_X_y=True)

# load Sparse Computation
from sparsecomputation import ApproximatePCA, SparseComputation

# Project data to a low-dimensional space of dimension 3
# with a fast, scalable version of PCA.
apca = ApproximatePCA(dimLow=3)
# Select pair if objects are closer than ~distance~ (1 / resolution).
# Controls sparsity (small distance - highly sparse matrix)
sc = SparseComputation(apca, distance=0.05)

# Input: n x d matrix - data
# Output: list of relevant pairs (undirected)
pairs = sc.select_pairs(data)
# Out: [(101, 142), (1, 9), (1, 34), (1, 37), ...]

Relevant Papers

For more details on the techniques read the following papers. Please cite these works if you use this implementation in academic work:

  • Dorit S. Hochbaum, Philipp Baumann (2016). Sparse computation for large-scale data mining. IEEE Transactions on Big Data, 2(2), 151-174.
  • Philipp Baumann, Dorit S. Hochbaum and Quico Spaen (2017). High-Performance Geometric Algorithms for Sparse Computation in Big Data Analytics. 2017 IEEE International Conference on Big Data, Boston MA.
  • Philipp Baumann, Dorit S. Hochbaum, and Quico Spaen (2016). Sparse-Reduced Computation. Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods.

Credits

Original Matlab implementation by Philipp Baumann. Original Python implementation by Titouan Jehl and Quico Spaen. Updated implementation with the new block shifting algorithm by Quico Spaen and Philipp Baumann.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparsecomputation-2019.6.3.tar.gz (27.8 kB view details)

Uploaded Source

Built Distribution

sparsecomputation-2019.6.3-py3-none-any.whl (8.9 kB view details)

Uploaded Python 3

File details

Details for the file sparsecomputation-2019.6.3.tar.gz.

File metadata

  • Download URL: sparsecomputation-2019.6.3.tar.gz
  • Upload date:
  • Size: 27.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.5.2

File hashes

Hashes for sparsecomputation-2019.6.3.tar.gz
Algorithm Hash digest
SHA256 2d1a3934a144590352273b16956175a18a67a7926d1c8c818281e6be9ace3ffe
MD5 fc32984a717a97a528ca23c282d128ef
BLAKE2b-256 c8220e459a1a078484a47bb18ea710ac0826312ab4c2557bee86f16e9ff3d2ed

See more details on using hashes here.

File details

Details for the file sparsecomputation-2019.6.3-py3-none-any.whl.

File metadata

  • Download URL: sparsecomputation-2019.6.3-py3-none-any.whl
  • Upload date:
  • Size: 8.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.5.2

File hashes

Hashes for sparsecomputation-2019.6.3-py3-none-any.whl
Algorithm Hash digest
SHA256 e028a4348121666de391c595a6dfd948b2d46f5c52a799d6c5350006a84285aa
MD5 4ada635c24ed95081db0a1725e27c45b
BLAKE2b-256 105ba389f00450e15191d67e9aaa78d15f46538f039dccd6087584c7236b8365

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page