Implementation of the KSU compression algorithm https://www.cs.bgu.ac.il/~karyeh/compression-arxiv.pdf
Project description
KSU Compression Algorithm Implementation
Algortihm 1 from Nearest-Neighbor Sample Compression: Efficiency, Consistency, Infinite Dimensions
Installation
- With pip:
pip install ksu
- From source:
git clone --depth=1 https://github.com/nimroha/ksu_classifier.git
cd ksu_classifier
python setup.py install
Usage
Command Line
This package provides two command line tools: e-net
and ksu
:
e-net
constructs an epsilon net for a given epsilonksu
runs the full algorithm
Both provide the -h flag to specify the arguments, and both can save the result to the disk in numpy's .npz format
Code
This package provides a class KSU(Xs, Ys, metric, [gram, prune, logLevel, n_jobs])
Xs
and Ys
are the data points and their respective labels as numpy arrays
metric
is either a callable to compute the metric or a string that names one of our provided metrics (print KSU.METRICS.keys()
for the full list)
gram
(optional, default=None) a precomputed gramian matrix, will be calculated if not provided.
prune
(optional, default=False) a boolean indicating whether to prune the compressed set or not (Algorithm 2 from Near-optimal sample compression for nearest neighbors)
logLevel
(optional, default='CRITICAL') a string indicating the logging level (set to 'INFO' or 'DEBUG' to get more information)
n_jobs
(optional, default=1) an integer defining how many cpus to use (scipy logic), pass -1 to use all. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used.
KSU
provides a method compressData([delta, minCompress, maxCompress, greedy, stride, logLevel, numProcs])
Which selects the subset with the lowest estimated error with confidence 1 - delta
. Can take arguments:
delta
(optional, default=0.1) confidence for error upper bound
minCompress
(optional, default=0.05) minimal compression ratio
maxCompress
(optional, default=0.1) maximum compression ratio
greedy
(optional, default=True) whether to use greedy or hierarichal strategy for net construction
stride
(optional, default=200) how many gammas to skip between each iteration (since similar gammas will produce similar nets)
logLevel
(optional, default='CRITICAL') a string indicating the logging level (set to 'INFO' or 'DEBUG' to get more information)
numProcs
(optional, default=1) number of processes to use
You can then run getClassifier()
which returns a 1-NN Classifer (based on sklearn's K-NN) fitted to the compressed data.
Or, run getCompressedSet()
to get the compressed data as a tuple of numpy arrays (compressedXs, compressedYs)
.
See scripts/
for example usage
Built-in metrics
['chebyshev', 'yule', 'sokalmichener', 'canberra', 'EarthMover', 'rogerstanimoto', 'matching', 'dice', 'EditDistance', 'braycurtis', 'russellrao', 'cosine', 'cityblock', 'l1', 'manhattan', 'sqeuclidean', 'jaccard', 'seuclidean', 'sokalsneath', 'kulsinski', 'minkowski', 'mahalanobis', 'euclidean', 'l2', 'hamming', 'correlation', 'wminkowski']
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ksu-0.5.1.tar.gz
.
File metadata
- Download URL: ksu-0.5.1.tar.gz
- Upload date:
- Size: 15.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.1 requests-toolbelt/0.9.1 tqdm/4.39.0 CPython/3.6.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4fe2a871e0935e7d308651253d24678b348122ba89129da195b35e06d0807cea |
|
MD5 | 46964f68f5aca31abeb4b8c1d13e91b7 |
|
BLAKE2b-256 | 4cd13dd1dd4b1049fbf95a9b138633959eb9592bb30bc2cc66c0a99056282749 |
File details
Details for the file ksu-0.5.1-py2.py3-none-any.whl
.
File metadata
- Download URL: ksu-0.5.1-py2.py3-none-any.whl
- Upload date:
- Size: 18.6 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.1 requests-toolbelt/0.9.1 tqdm/4.39.0 CPython/3.6.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6db3a7e29dbaaadeacbe214486054443d6a068c99ffa02474e4be625334b3133 |
|
MD5 | 4a5094316bd706940e3e1fef955db7c3 |
|
BLAKE2b-256 | 7b8ebbb3abc78774cb23d9d5ccda43a953abfc8c7681e685d35c2937b5dc2d54 |