A GPU implementation of HyperLogLog
Project description
PyCudaHLL
This is a GPU accelerated implementation of HyperLogLog using the CuPy library. It was created for the class "Algorithmic Techniques for Taming Big Data" at the department of Computing and Data Science at Boston University.
Using the Code
To use this code, you can either get the library from PyPI or build it from source.
Get from PyPI (Recommended)
- Install using pip:
pip install pycudahll
- In your code, import the library:
from pycudahll.CudaHLL import CudaHLL
Building from Source
- Clone the repository
- Install dependencies:
poetry install
- In your code, import the library:
from pycudahll.CudaHLL import CudaHLL
- See
test.py
for examples. (Note:test.py
is most likely in a broken state, but should give you an idea of how to use the library.)
API
The main class of the library is CudaHLL. It can be imported in your code with:
from pycudahll.CudaHLL import CudaHLL
CudaHLL also includes a helper function to hash data to use with the main class:
from pycudahll.CudaHLL import hashDataGPUHLL
A short example of how to use the library is as follows:
from pycudahll.CudaHLL import CudaHLL, hashDataGPUHLL
with open('data.csv', 'r') as file:
data = file.read().split(',')
hashedData = hashDataGPUHLL(data)
threads = 64
p = 14
cudaDevice = 0 # optional
roundThreads = True # optional
hll = CudaHLL(p, threads, cudaDevice, roundThreads)
hll.add(hashedData)
print(hll.card()) # print unrounded cardinality estimate
print(len(hll)) # print rounded cardinality estimate
Test Data
Text of Shakespeare plays obtained from https://ocw.mit.edu/ans7870/6/6.006/s08/lecturenotes/files/t8.shakespeare.txt. Original text can be found in t8.shakespeare.txt and the modified text can be found in shakespeare.csv.
Total number of items = 899300 Exact cardinality = 34065
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pycudahll-0.1.0.tar.gz
.
File metadata
- Download URL: pycudahll-0.1.0.tar.gz
- Upload date:
- Size: 37.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.2 CPython/3.11.0 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5806b9d6557a7b816f07f1750dd21d10f816c1bddefaf990cd3f5ffe0642ffd2 |
|
MD5 | 4d9862ac16c35ca4955b312be59c8753 |
|
BLAKE2b-256 | 0701d2f78fdd3ddd61fc9d0c111a8f6d091a1d04ebf50257c7fe783cd2653946 |
File details
Details for the file pycudahll-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: pycudahll-0.1.0-py3-none-any.whl
- Upload date:
- Size: 37.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.2 CPython/3.11.0 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6a367b7c5ae2071907fda59f07014baf63039b541b97fdf6a9bd22a5f9d11456 |
|
MD5 | 127cf33dec363f2a9282a24a345a867c |
|
BLAKE2b-256 | 77b9522efd40eae19e4d183af618dcfc23dbf59b5dcd57a821b9d090b19d0aeb |