A Library with Fast Sketching Primitives
Project description
Quick Sketches
While implementing a paper, we had to create a new library for Count-Min Sketches and Count-Sketches, as current solutions were inadequate. C++ libraries are difficult to use, and Python packages tend to be slow. Thus, we wrote a Python package that contains Python bindings to a C++ library to calculate the predictions of the count-min sketch and count sketches. In short, you can take advantage of C++ code to get predictions for the count-min sketch and count sketch while writing just Python!
Installation
To install the package, one simply only needs to run
pip install quick_sketches
Usage
Consider the following example to get the count-min sketch when there is three of an element, four of another element, and finally five of a last element. There are five levels in the sketch, and each level contains 100 counters.
import quick_sketches as m
import numpy as np
a = np.array([3, 4, 5])
m.cm_sketch_preds(5, a, 100, 1) # usually [3, 4, 5], but sometimes not!
[numpy array of long longs] cm_sketch_preds(int nhashes, [numpy array of long longs] np_input, ll width, int seed)
takes as input
- nhashes: The number of hash functions used in the sketch
- np_input: The numpy array containing the frequencies of each key (note that order doesn't matter)
- width: The width of the sketch, or the number of entries in each row of the sketch
- seed: A random seed.
Then, it outputs what a count-min sketch would give as predicted frequencies with that particular set of parameters, where the output prediction at index i corresponds to the key in np_input at index i. For example, if np_input was [3, 4, 5], the output might also be [3, 4, 5], but it could not be [4, 3, 5], by the conservative nature of the sketch.
[numpy array of doubles] count_sketch_preds(int nhashes, [numpy array of long longs] np_input, ll width, int seed)
performs the same function for the Count-Sketch, with parameters:
- nhashes: The number of hash functions used in the sketch
- np_input: The numpy array containing the frequencies of each key (note that order doesn't matter)
- width: The width of the sketch, or the number of entries in each row of the sketch
- seed: A random seed.
Note one key difference, however: the output is in doubles, because the count-sketch takes medians, which sometimes leads to half-integer outputs!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for quick_sketches-1.0.0-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8081176e8756b03985a03bfe7d895c9021324f42aa27a50824e73a05a113bf5d |
|
MD5 | 4e1d46485d0303d6a71866be5dccf40d |
|
BLAKE2b-256 | a9a38903d8b728c263499d01d92ea917243ec2d72bf0d0a3208be39ef48786d5 |
Hashes for quick_sketches-1.0.0-cp38-cp38-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | aaf3ff963b0395f41a8433453585460384b7270401f87b4af42bd3ff93fd423c |
|
MD5 | 30ad5b46bcc19f5224c090b713fe6d1a |
|
BLAKE2b-256 | 612ad0da51c3da616d5c2e78a2f5648efca7075631e7aef79f6b2892702fb45c |
Hashes for quick_sketches-1.0.0-cp38-cp38-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b69c72a1515025baa2c93db5721573eaf5ccd7ad3be49c5110b19530758a8b84 |
|
MD5 | 4e27591349766e69fb07969280803c65 |
|
BLAKE2b-256 | 53d4d1a3620e5ecaba1c30ec0c49328153356227a1d1006c15a0bd8515821f19 |
Hashes for quick_sketches-1.0.0-cp38-cp38-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1e9d57be730c453c03d55d3594459f3993e49f4feb1830f2c78a584dbb165d4a |
|
MD5 | ad171eddb02b270cc87f21fb987ef2d8 |
|
BLAKE2b-256 | e7df5874dca85b16b00dc00dd4aa7405ca3eae3143bddce132e02dbc01485368 |
Hashes for quick_sketches-1.0.0-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a4db07d4ba4f0407922464065205cd0abc01b79519ca6baba375fe0f404afe2a |
|
MD5 | ac848a783e17159acc1d6805aec0169a |
|
BLAKE2b-256 | ff27328aeb28713f5e7f9d213d5c315eaaa91e0a48250e8710085e645640fdaa |