A wrapper for the C++ Datasketches library
Project description
# Python Wrapper for Datasketches
## Installation
The release files do not include the needed python binding library ([pybind11](https://github.com/pybind/pybind11)). If building from a relase package, you must ensure that the pybind11 directory points to a local copy of pybind11.
An official pypi build is eventually planned but not yet available.
If you instead want to take a (possibly ill-advised) gamble on the current state of the master branch being useable, you can run: `pip install git+https://github.com/apache/incubator-datasketches-cpp.git`
## Developer Instructions
### Building
When cloning the source repository, you should include the pybind11 submodule with the –recursive option to the clone command: ` git clone --recursive https://github.com/apache/incubator-datasketches-cpp.git cd incubator-datasketches-cpp python -m pip install --upgrade pip setuptools wheel numpy python setup.py build `
If you cloned without –recursive, you can add the submodule post-checkout using git submodule update –init –recursive.
### Installing
Assuming you have already checked out the library and any dependent submodules, install by simply replacing the lsat line of the build command with python setup.py install.
### Unit tests
The python tests are run with tox. To ensure you have all the needed packages, from the package base directory run: ` python -m pip install --upgrade pip setuptools wheel numpy tox tox `
## Usage
Having installed the library, loading the Datasketches library in Python is simple: import datasketches.
## Available Sketch Classes
- KLL
kll_ints_sketch
kll_floats_sketch
- Frequent Items
frequent_strings_sketch
Error types are frequent_items_error_type.{NO_FALSE_NEGATIVES | NO_FALSE_POSITIVES}
- Theta
update_theta_sketch
compact_theta_sketch (cannot be instantiated directly)
theta_union
theta_intersection
theta_a_not_b
- HLL
hll_sketch
hll_union
Target HLL types are tgt_hll_type.{HLL_4 | HLL_6 | HLL_8}
- CPC
cpc_sketch
cpc_union
- VarOpt Sampling
var_opt_sketch
var_opt_union
## Known Differences from C++
The Python API largely mirrors the C++ API, with a few minor exceptions: The primary known differences are that Python on modern platforms does not support unsigned integer values or numeric values with fewer than 64 bits. As a result, you may not be able to produce identical sketches from within Python as you can with Java and C++. Loading those sketches after they have been serialized from another language will work as expected.
We have also removed reliance on a builder class for theta sketches as Python allows named arguments to the constructor, not strictly positional arguments.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for whylabs_datasketches-2.0.0b7-cp39-cp39-macosx_10_15_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3ad87736d695dd6b9c1dd7a199f1585a4cab5295705a7864355dcbd0ec4f8b51 |
|
MD5 | 0e5935ae39cb8df64708a388f2a4a3c0 |
|
BLAKE2b-256 | 1cf8f8d7bde35242d76623ef7a38a87588bb572cc639e56e840feb917fee41e6 |
Hashes for whylabs_datasketches-2.0.0b7-cp38-cp38-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c57a3e9d5d5f38c06fa27c79ab9292ec0611b9dac09fcf7580d60c583a5d9f32 |
|
MD5 | 09f031a875876e398445bd0d35de705d |
|
BLAKE2b-256 | 7fae27981e9289d48040be52155bb6efe638c90141a9c2be58ed41ed1041607e |
Hashes for whylabs_datasketches-2.0.0b7-cp38-cp38-macosx_10_15_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c50da7d88a5880bfb6bb4af9d5428985c8ed7a3580868c2a30869d247a5c85fc |
|
MD5 | ff46f58ae318bd460d527fe43843e786 |
|
BLAKE2b-256 | a55733c99df15ec6fe914b0a1e5ff778335a035ccd8a6bb8c1d141337c801b17 |
Hashes for whylabs_datasketches-2.0.0b7-cp37-cp37m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d9c44f5caaf1882f532fd0228c2d335fa8df553351557c8969a04ecdb6e3038f |
|
MD5 | d3d166f612e515af59235dd4e23dac82 |
|
BLAKE2b-256 | ada8cc651f843c4b14460400531b857fd0501573d4063b940a3b3da2417540ad |
Hashes for whylabs_datasketches-2.0.0b7-cp37-cp37m-macosx_10_15_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9d1fa30a53e68bf02099518fff96047e5fca72b3bc4b302061373357144dc9c4 |
|
MD5 | 9c242d12171029df96287f9a8a292af3 |
|
BLAKE2b-256 | 5571130465e3cf9b52b59ac2b79c7d4aa05bc3be0267ff372a7804f4ee771398 |
Hashes for whylabs_datasketches-2.0.0b7-cp36-cp36m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 89bedf344558da79d40bea288919fd4314d5e75dfe426c2f2f4542b2beca9266 |
|
MD5 | 8ad400563e8bf4834208da21d5e46dec |
|
BLAKE2b-256 | cf22ff21fd4fd297999debe49665b2a36a7f3389bb26ff68ef82f423f2efb35a |
Hashes for whylabs_datasketches-2.0.0b7-cp36-cp36m-macosx_10_15_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ed34845469616ad3d458ebd0800cf251de6cc5cb9ff0bbd59c94f63c4d87d48a |
|
MD5 | 056a6ea8c8f4545947c50637f6f62b8e |
|
BLAKE2b-256 | 1fc3ae181dc3131eff4a379426536dce22e3af9732019ecd1ac3ba72337f50d4 |
Hashes for whylabs_datasketches-2.0.0b7-cp35-cp35m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8a6f620acb52434c44fee694a53e1714d249779e087d5007bae6a8cc5f92a500 |
|
MD5 | fa225238e64d823a9f096330981f33dd |
|
BLAKE2b-256 | 5a09e3e3a494759ed797f98ba03cccc82b884a49722266174beede3560847cfe |
Hashes for whylabs_datasketches-2.0.0b7-cp35-cp35m-macosx_10_15_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b133c2658ec67eda68a725b54299423097a386133439fa75a0e052b2269305fb |
|
MD5 | 4bcfc00ee5a1cbf8c1c6a44dbf4331dc |
|
BLAKE2b-256 | 7a52ac1844fa63051c29ac03ee2b018494df275d7a49ea47cda2da6556652af7 |