A Library with Fast Sketching Primitives
Project description
Fast Sketches
While implementing a paper, we had to create a new library for Count-Min Sketches and Count-Sketches, as current solutions were inadequate. C++ libraries are difficult to use, and Python packages tend to be slow. Thus, we wrote a Python package that contains Python bindings to a C++ library to calculate the predictions of the count-min sketch and count-sketches. This code is located in fast_sketches.cpp
.
Installation
To install the package, one simply only needs to run
pip install fast_sketches
Usage
[numpy array of long longs] cm_sketch_preds(int nhashes, [numpy array of long longs] np_input, ll width, int seed)
takes as input
- nhashes: The number of hash functions used in the sketch
- np_input: The numpy array containing the frequencies of each key (note that order doesn't matter)
- width: The width of the sketch, or the number of entries in each row of the sketch
- seed: A random seed.
Then, it outputs what a count-min sketch would give as predicted frequencies with that particular set of parameters, where the output prediction at index i corresponds to the key in np_input at index i. For example, if np_input was [3, 4, 5], the output might also be [3, 4, 5], but it could not be [4, 3, 5], by the conservative nature of the sketch.
[numpy array of doubles] count_sketch_preds(int nhashes, [numpy array of long longs] np_input, ll width, int seed)
performs the same function for the Count-Sketch, with parameters:
- nhashes: The number of hash functions used in the sketch
- np_input: The numpy array containing the frequencies of each key (note that order doesn't matter)
- width: The width of the sketch, or the number of entries in each row of the sketch
- seed: A random seed.
Note one key difference, however: the output is in doubles, because the count-sketch takes medians, which sometimes leads to half-integer outputs!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for quick_sketches-0.0.2-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 00a090667f3d5d851f7183d6032ecdb08812f9ac6252639bb9970883c957d10e |
|
MD5 | 0169890e12f8a593584d4803e102a63f |
|
BLAKE2b-256 | 0d4b659c72ed1bb067a3ab2e93b0b8a0c8da1a6be83beeb6d69a8b35426ce4e0 |
Hashes for quick_sketches-0.0.2-cp38-cp38-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 43aac9f533e68ccc90678251f72af58301d787d3332d76924b84b42bbe7ee714 |
|
MD5 | d07b48e7f07c58e6e72ecf4564c63e19 |
|
BLAKE2b-256 | 82938db7b5dd1627a035c0ba505a9c183bf0f74591d8c78937fd3f01019b5913 |
Hashes for quick_sketches-0.0.2-cp38-cp38-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | df4ead1fd23b60df0d8f6317b4924a32193b1f01bb3c92cd4cb834ac2aad3cb6 |
|
MD5 | ffc0daf9bd4950a54d70e663c4f18896 |
|
BLAKE2b-256 | ae6497bdf21b218a076cc74f5963ad7f5858c1be5e3224a36838350c65177f27 |
Hashes for quick_sketches-0.0.2-cp38-cp38-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e1f027258be2e3179aab35867377c9469a801517abff1d36ca9f9c20ce4935b7 |
|
MD5 | f49ccb8fb61726b44f1ec0394736c753 |
|
BLAKE2b-256 | 2a216fc1e20b89a52e5606b6043b77aae29fbe60286f2113ecbfb03ff3ad0925 |
Hashes for quick_sketches-0.0.2-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 83ad41d895e530869a00d9b154a59b7258d619c94c3b1081f98d81f391e73606 |
|
MD5 | 6eb3264d0efd406183669691bad0e25e |
|
BLAKE2b-256 | 30b05e5e1fb9a03720762b2ed9a1133d12de52e17beb16405e93b6832e2c02a4 |