Skip to main content

Implementation of the KSU compression algorithm https://www.cs.bgu.ac.il/~karyeh/compression-arxiv.pdf

Project description

KSU Compression Algorithm Implementation

Algortihm 1 from Nearest-Neighbor Sample Compression: Efficiency, Consistency, Infinite Dimensions

Installation

  • With pip: pip install ksu
  • From source:
    • git clone --depth=1 https://github.com/nimroha/ksu_classifier.git
    • cd ksu_classifier
    • python setup.py install

Usage

Command Line

This package provides two command line tools: e-net and ksu:

  • e-net constructs an epsilon net for a given epsilon
  • ksu runs the full algorithm

Both provide the -h flag to specify the arguments, and both can save the result to the disk in numpy's .npz format


Code

This package provides a class KSU(Xs, Ys, metric, [gram, prune, logLevel, n_jobs])

Xs and Ys are the data points and their respective labels as numpy arrays

metric is either a callable to compute the metric or a string that names one of our provided metrics (print KSU.METRICS.keys() for the full list)

gram (optional, default=None) a precomputed gramian matrix, will be calculated if not provided.

prune (optional, default=False) a boolean indicating whether to prune the compressed set or not (Algorithm 2 from Near-optimal sample compression for nearest neighbors)

logLevel (optional, default='CRITICAL') a string indicating the logging level (set to 'INFO' or 'DEBUG' to get more information)

n_jobs (optional, default=1) an integer defining how many cpus to use (scipy logic), pass -1 to use all. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used.


KSU provides a method compressData([delta, minCompress, maxCompress, greedy, stride, logLevel, numProcs])

Which selects the subset with the lowest estimated error with confidence 1 - delta. Can take arguments:

delta (optional, default=0.1) confidence for error upper bound

minCompress (optional, default=0.05) minimal compression ratio

maxCompress (optional, default=0.1) maximum compression ratio

greedy (optional, default=True) whether to use greedy or hierarichal strategy for net construction

stride (optional, default=200) how many gammas to skip between each iteration (since similar gammas will produce similar nets)

logLevel (optional, default='CRITICAL') a string indicating the logging level (set to 'INFO' or 'DEBUG' to get more information)

numProcs (optional, default=1) number of processes to use


You can then run getClassifier() which returns a 1-NN Classifer (based on sklearn's K-NN) fitted to the compressed data.

Or, run getCompressedSet() to get the compressed data as a tuple of numpy arrays (compressedXs, compressedYs).


See scripts/ for example usage

Built-in metrics

['chebyshev', 'yule', 'sokalmichener', 'canberra', 'EarthMover', 'rogerstanimoto', 'matching', 'dice', 'EditDistance', 'braycurtis', 'russellrao', 'cosine', 'cityblock', 'l1', 'manhattan', 'sqeuclidean', 'jaccard', 'seuclidean', 'sokalsneath', 'kulsinski', 'minkowski', 'mahalanobis', 'euclidean', 'l2', 'hamming', 'correlation', 'wminkowski']

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ksu-0.5.1.tar.gz (15.4 kB view details)

Uploaded Source

Built Distribution

ksu-0.5.1-py2.py3-none-any.whl (18.6 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file ksu-0.5.1.tar.gz.

File metadata

  • Download URL: ksu-0.5.1.tar.gz
  • Upload date:
  • Size: 15.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.1 requests-toolbelt/0.9.1 tqdm/4.39.0 CPython/3.6.7

File hashes

Hashes for ksu-0.5.1.tar.gz
Algorithm Hash digest
SHA256 4fe2a871e0935e7d308651253d24678b348122ba89129da195b35e06d0807cea
MD5 46964f68f5aca31abeb4b8c1d13e91b7
BLAKE2b-256 4cd13dd1dd4b1049fbf95a9b138633959eb9592bb30bc2cc66c0a99056282749

See more details on using hashes here.

File details

Details for the file ksu-0.5.1-py2.py3-none-any.whl.

File metadata

  • Download URL: ksu-0.5.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 18.6 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.1 requests-toolbelt/0.9.1 tqdm/4.39.0 CPython/3.6.7

File hashes

Hashes for ksu-0.5.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 6db3a7e29dbaaadeacbe214486054443d6a068c99ffa02474e4be625334b3133
MD5 4a5094316bd706940e3e1fef955db7c3
BLAKE2b-256 7b8ebbb3abc78774cb23d9d5ccda43a953abfc8c7681e685d35c2937b5dc2d54

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page