A Library with Fast Sketching Primitives
Project description
Quick Sketches
While implementing a paper, we had to create a new library for Count-Min Sketches and Count-Sketches, as current solutions were inadequate. C++ libraries are difficult to use, and Python packages tend to be slow. Thus, we wrote a Python package that contains Python bindings to a C++ library to calculate the predictions of the count-min sketch and count sketches. In short, you can take advantage of C++ code to get predictions for the count-min sketch and count sketch while writing just Python!
Installation
To install the package, one simply only needs to run
pip install quick_sketches
Usage
Consider the following example to get the count-min sketch when there is three of an element, four of another element, and finally five of a last element. There are five levels in the sketch, and each level contains 100 counters.
import quick_sketches as m
import numpy as np
a = np.array([3, 4, 5])
m.cm_sketch_preds(5, a, 100, 1) # usually [3, 4, 5], but sometimes not!
[numpy array of long longs] cm_sketch_preds(int nhashes, [numpy array of long longs] np_input, ll width, int seed)
takes as input
- nhashes: The number of hash functions used in the sketch
- np_input: The numpy array containing the frequencies of each key (note that order doesn't matter)
- width: The width of the sketch, or the number of entries in each row of the sketch
- seed: A random seed.
Then, it outputs what a count-min sketch would give as predicted frequencies with that particular set of parameters, where the output prediction at index i corresponds to the key in np_input at index i. For example, if np_input was [3, 4, 5], the output might also be [3, 4, 5], but it could not be [4, 3, 5], by the conservative nature of the sketch.
[numpy array of doubles] count_sketch_preds(int nhashes, [numpy array of long longs] np_input, ll width, int seed)
performs the same function for the Count-Sketch, with parameters:
- nhashes: The number of hash functions used in the sketch
- np_input: The numpy array containing the frequencies of each key (note that order doesn't matter)
- width: The width of the sketch, or the number of entries in each row of the sketch
- seed: A random seed.
Note one key difference, however: the output is in doubles, because the count-sketch takes medians, which sometimes leads to half-integer outputs!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file quick_sketches-1.0.0.tar.gz
.
File metadata
- Download URL: quick_sketches-1.0.0.tar.gz
- Upload date:
- Size: 5.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d1258e4c9c3006268c7aa48fc1c8db1be3580b39649f6c1f2146463f5c6b4c2d |
|
MD5 | 7e8d5e5789f6241142cc83c735b31123 |
|
BLAKE2b-256 | 4209c091aee422246dff13e5e6b4883ca98fecf1107c1c06c29b85e17090817d |
File details
Details for the file quick_sketches-1.0.0-cp38-cp38-win_amd64.whl
.
File metadata
- Download URL: quick_sketches-1.0.0-cp38-cp38-win_amd64.whl
- Upload date:
- Size: 58.4 kB
- Tags: CPython 3.8, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8081176e8756b03985a03bfe7d895c9021324f42aa27a50824e73a05a113bf5d |
|
MD5 | 4e1d46485d0303d6a71866be5dccf40d |
|
BLAKE2b-256 | a9a38903d8b728c263499d01d92ea917243ec2d72bf0d0a3208be39ef48786d5 |
File details
Details for the file quick_sketches-1.0.0-cp38-cp38-win32.whl
.
File metadata
- Download URL: quick_sketches-1.0.0-cp38-cp38-win32.whl
- Upload date:
- Size: 52.2 kB
- Tags: CPython 3.8, Windows x86
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | aaf3ff963b0395f41a8433453585460384b7270401f87b4af42bd3ff93fd423c |
|
MD5 | 30ad5b46bcc19f5224c090b713fe6d1a |
|
BLAKE2b-256 | 612ad0da51c3da616d5c2e78a2f5648efca7075631e7aef79f6b2892702fb45c |
File details
Details for the file quick_sketches-1.0.0-cp38-cp38-manylinux2010_x86_64.whl
.
File metadata
- Download URL: quick_sketches-1.0.0-cp38-cp38-manylinux2010_x86_64.whl
- Upload date:
- Size: 89.1 kB
- Tags: CPython 3.8, manylinux: glibc 2.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b69c72a1515025baa2c93db5721573eaf5ccd7ad3be49c5110b19530758a8b84 |
|
MD5 | 4e27591349766e69fb07969280803c65 |
|
BLAKE2b-256 | 53d4d1a3620e5ecaba1c30ec0c49328153356227a1d1006c15a0bd8515821f19 |
File details
Details for the file quick_sketches-1.0.0-cp38-cp38-manylinux2010_i686.whl
.
File metadata
- Download URL: quick_sketches-1.0.0-cp38-cp38-manylinux2010_i686.whl
- Upload date:
- Size: 96.0 kB
- Tags: CPython 3.8, manylinux: glibc 2.12+ i686
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1e9d57be730c453c03d55d3594459f3993e49f4feb1830f2c78a584dbb165d4a |
|
MD5 | ad171eddb02b270cc87f21fb987ef2d8 |
|
BLAKE2b-256 | e7df5874dca85b16b00dc00dd4aa7405ca3eae3143bddce132e02dbc01485368 |
File details
Details for the file quick_sketches-1.0.0-cp38-cp38-macosx_10_9_x86_64.whl
.
File metadata
- Download URL: quick_sketches-1.0.0-cp38-cp38-macosx_10_9_x86_64.whl
- Upload date:
- Size: 56.7 kB
- Tags: CPython 3.8, macOS 10.9+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a4db07d4ba4f0407922464065205cd0abc01b79519ca6baba375fe0f404afe2a |
|
MD5 | ac848a783e17159acc1d6805aec0169a |
|
BLAKE2b-256 | ff27328aeb28713f5e7f9d213d5c315eaaa91e0a48250e8710085e645640fdaa |