Implementation of the KSU compression algorithm https://www.cs.bgu.ac.il/~karyeh/compression-arxiv.pdf
Project description
KSU Compression Algorithm Implementation
Algortihm 1 from Nearest-Neighbor Sample Compression: Efficiency, Consistency, Infinite Dimensions
Installation
- With pip:
pip install ksu
- From source:
git clone --depth=1 https://github.com/nimroha/ksu_classifier.git
cd ksu_classifier
python setup.py install
Usage
Command Line
This package provides two command line tools: e-net
and ksu
:
e-net
constructs an epsilon net for a given epsilonksu
runs the full algorithm
Both provide the -h flag to specify the arguments, and both can save the result to the disk in numpy's .npz format
Code
This package provides a class KSU(Xs, Ys, metric, [gram, prune, logLevel, n_jobs])
Xs
and Ys
are the data points and their respective labels as numpy arrays
metric
is either a callable to compute the metric or a string that names one of our provided metrics (print KSU.METRICS.keys()
for the full list)
gram
(optional, default=None) a precomputed gramian matrix, will be calculated if not provided.
prune
(optional, default=False) a boolean indicating whether to prune the compressed set or not (Algorithm 2 from Near-optimal sample compression for nearest neighbors)
logLevel
(optional, default='CRITICAL') a string indicating the logging level (set to 'INFO' or 'DEBUG' to get more information)
n_jobs
(optional, default=1) an integer defining how many cpus to use (scipy logic), pass -1 to use all. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used.
KSU
provides a method compressData([delta, minCompress, maxCompress, greedy, stride, logLevel, numProcs])
Which selects the subset with the lowest estimated error with confidence 1 - delta
. Can take arguments:
delta
(optional, default=0.1) confidence for error upper bound
minCompress
(optional, default=0.05) minimal compression ratio
maxCompress
(optional, default=0.1) maximum compression ratio
greedy
(optional, default=True) whether to use greedy or hierarichal strategy for net construction
stride
(optional, default=200) how many gammas to skip between each iteration (since similar gammas will produce similar nets)
logLevel
(optional, default='CRITICAL') a string indicating the logging level (set to 'INFO' or 'DEBUG' to get more information)
numProcs
(optional, default=1) number of processes to use
You can then run getClassifier()
which returns a 1-NN Classifer (based on sklearn's K-NN) fitted to the compressed data.
Or, run getCompressedSet()
to get the compressed data as a tuple of numpy arrays (compressedXs, compressedYs)
.
See scripts/
for example usage
Built-in metrics
['chebyshev', 'yule', 'sokalmichener', 'canberra', 'EarthMover', 'rogerstanimoto', 'matching', 'dice', 'EditDistance', 'braycurtis', 'russellrao', 'cosine', 'cityblock', 'l1', 'manhattan', 'sqeuclidean', 'jaccard', 'seuclidean', 'sokalsneath', 'kulsinski', 'minkowski', 'mahalanobis', 'euclidean', 'l2', 'hamming', 'correlation', 'wminkowski']
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.