Skip to main content

A Library with Fast Sketching Primitives

Project description

Quick Sketches

While implementing a paper, we had to create a new library for Count-Min Sketches and Count-Sketches, as current solutions were inadequate. C++ libraries are difficult to use, and Python packages tend to be slow. Thus, we wrote a Python package that contains Python bindings to a C++ library to calculate the predictions of the count-min sketch and count sketches. In short, you can take advantage of C++ code to get predictions for the count-min sketch and count sketch while writing just Python!

Installation

To install the package, one simply only needs to run

pip install quick_sketches

Usage

Consider the following example to get the count-min sketch when there is three of an element, four of another element, and finally five of a last element. There are five levels in the sketch, and each level contains 100 counters.

import quick_sketches as m
import numpy as np

a = np.array([3, 4, 5])
m.cm_sketch_preds(5, a, 100, 1) # usually [3, 4, 5], but sometimes not!
[numpy array of long longs] cm_sketch_preds(int nhashes, [numpy array of long longs] np_input, ll width, int seed)

takes as input

  1. nhashes: The number of hash functions used in the sketch
  2. np_input: The numpy array containing the frequencies of each key (note that order doesn't matter)
  3. width: The width of the sketch, or the number of entries in each row of the sketch
  4. seed: A random seed.

Then, it outputs what a count-min sketch would give as predicted frequencies with that particular set of parameters, where the output prediction at index i corresponds to the key in np_input at index i. For example, if np_input was [3, 4, 5], the output might also be [3, 4, 5], but it could not be [4, 3, 5], by the conservative nature of the sketch.

[numpy array of doubles] count_sketch_preds(int nhashes, [numpy array of long longs] np_input, ll width, int seed)

performs the same function for the Count-Sketch, with parameters:

  1. nhashes: The number of hash functions used in the sketch
  2. np_input: The numpy array containing the frequencies of each key (note that order doesn't matter)
  3. width: The width of the sketch, or the number of entries in each row of the sketch
  4. seed: A random seed.

Note one key difference, however: the output is in doubles, because the count-sketch takes medians, which sometimes leads to half-integer outputs!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quick_sketches-1.0.0.tar.gz (5.0 kB view details)

Uploaded Source

Built Distributions

quick_sketches-1.0.0-cp38-cp38-win_amd64.whl (58.4 kB view details)

Uploaded CPython 3.8 Windows x86-64

quick_sketches-1.0.0-cp38-cp38-win32.whl (52.2 kB view details)

Uploaded CPython 3.8 Windows x86

quick_sketches-1.0.0-cp38-cp38-manylinux2010_x86_64.whl (89.1 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.12+ x86-64

quick_sketches-1.0.0-cp38-cp38-manylinux2010_i686.whl (96.0 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.12+ i686

quick_sketches-1.0.0-cp38-cp38-macosx_10_9_x86_64.whl (56.7 kB view details)

Uploaded CPython 3.8 macOS 10.9+ x86-64

File details

Details for the file quick_sketches-1.0.0.tar.gz.

File metadata

  • Download URL: quick_sketches-1.0.0.tar.gz
  • Upload date:
  • Size: 5.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.6

File hashes

Hashes for quick_sketches-1.0.0.tar.gz
Algorithm Hash digest
SHA256 d1258e4c9c3006268c7aa48fc1c8db1be3580b39649f6c1f2146463f5c6b4c2d
MD5 7e8d5e5789f6241142cc83c735b31123
BLAKE2b-256 4209c091aee422246dff13e5e6b4883ca98fecf1107c1c06c29b85e17090817d

See more details on using hashes here.

File details

Details for the file quick_sketches-1.0.0-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: quick_sketches-1.0.0-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 58.4 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.6

File hashes

Hashes for quick_sketches-1.0.0-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 8081176e8756b03985a03bfe7d895c9021324f42aa27a50824e73a05a113bf5d
MD5 4e1d46485d0303d6a71866be5dccf40d
BLAKE2b-256 a9a38903d8b728c263499d01d92ea917243ec2d72bf0d0a3208be39ef48786d5

See more details on using hashes here.

File details

Details for the file quick_sketches-1.0.0-cp38-cp38-win32.whl.

File metadata

  • Download URL: quick_sketches-1.0.0-cp38-cp38-win32.whl
  • Upload date:
  • Size: 52.2 kB
  • Tags: CPython 3.8, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.6

File hashes

Hashes for quick_sketches-1.0.0-cp38-cp38-win32.whl
Algorithm Hash digest
SHA256 aaf3ff963b0395f41a8433453585460384b7270401f87b4af42bd3ff93fd423c
MD5 30ad5b46bcc19f5224c090b713fe6d1a
BLAKE2b-256 612ad0da51c3da616d5c2e78a2f5648efca7075631e7aef79f6b2892702fb45c

See more details on using hashes here.

File details

Details for the file quick_sketches-1.0.0-cp38-cp38-manylinux2010_x86_64.whl.

File metadata

  • Download URL: quick_sketches-1.0.0-cp38-cp38-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 89.1 kB
  • Tags: CPython 3.8, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.6

File hashes

Hashes for quick_sketches-1.0.0-cp38-cp38-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 b69c72a1515025baa2c93db5721573eaf5ccd7ad3be49c5110b19530758a8b84
MD5 4e27591349766e69fb07969280803c65
BLAKE2b-256 53d4d1a3620e5ecaba1c30ec0c49328153356227a1d1006c15a0bd8515821f19

See more details on using hashes here.

File details

Details for the file quick_sketches-1.0.0-cp38-cp38-manylinux2010_i686.whl.

File metadata

  • Download URL: quick_sketches-1.0.0-cp38-cp38-manylinux2010_i686.whl
  • Upload date:
  • Size: 96.0 kB
  • Tags: CPython 3.8, manylinux: glibc 2.12+ i686
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.6

File hashes

Hashes for quick_sketches-1.0.0-cp38-cp38-manylinux2010_i686.whl
Algorithm Hash digest
SHA256 1e9d57be730c453c03d55d3594459f3993e49f4feb1830f2c78a584dbb165d4a
MD5 ad171eddb02b270cc87f21fb987ef2d8
BLAKE2b-256 e7df5874dca85b16b00dc00dd4aa7405ca3eae3143bddce132e02dbc01485368

See more details on using hashes here.

File details

Details for the file quick_sketches-1.0.0-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: quick_sketches-1.0.0-cp38-cp38-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 56.7 kB
  • Tags: CPython 3.8, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.6

File hashes

Hashes for quick_sketches-1.0.0-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 a4db07d4ba4f0407922464065205cd0abc01b79519ca6baba375fe0f404afe2a
MD5 ac848a783e17159acc1d6805aec0169a
BLAKE2b-256 ff27328aeb28713f5e7f9d213d5c315eaaa91e0a48250e8710085e645640fdaa

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page