Skip to main content

A GPU implementation of HyperLogLog

Project description

PyCudaHLL

This is a GPU accelerated implementation of HyperLogLog using the CuPy library. It was created for the class "Algorithmic Techniques for Taming Big Data" at the department of Computing and Data Science at Boston University.

Using the Code

To use this code, you can either get the library from PyPI or build it from source.

Get from PyPI (Recommended)

  • Install using pip: pip install pycudahll
  • In your code, import the library: from pycudahll.CudaHLL import CudaHLL

Building from Source

  • Clone the repository
  • Install dependencies: poetry install
  • In your code, import the library: from pycudahll.CudaHLL import CudaHLL
  • See test.py for examples. (Note: test.py is most likely in a broken state, but should give you an idea of how to use the library.)

API

The main class of the library is CudaHLL. It can be imported in your code with:

from pycudahll.CudaHLL import CudaHLL

CudaHLL also includes a helper function to hash data to use with the main class:

from pycudahll.CudaHLL import hashDataGPUHLL

A short example of how to use the library is as follows:

from pycudahll.CudaHLL import CudaHLL, hashDataGPUHLL

with open('data.csv', 'r') as file:
    data = file.read().split(',')
    hashedData = hashDataGPUHLL(data)

    threads = 64
    p = 14
    cudaDevice = 0 # optional
    roundThreads = True # optional
    hll = CudaHLL(p, threads, cudaDevice, roundThreads)

    hll.add(hashedData)
    print(hll.card()) # print unrounded cardinality estimate
    print(len(hll)) # print rounded cardinality estimate

Test Data

Text of Shakespeare plays obtained from https://ocw.mit.edu/ans7870/6/6.006/s08/lecturenotes/files/t8.shakespeare.txt. Original text can be found in t8.shakespeare.txt and the modified text can be found in shakespeare.csv.

Total number of items = 899300 Exact cardinality = 34065

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pycudahll-0.1.0.tar.gz (37.5 kB view details)

Uploaded Source

Built Distribution

pycudahll-0.1.0-py3-none-any.whl (37.9 kB view details)

Uploaded Python 3

File details

Details for the file pycudahll-0.1.0.tar.gz.

File metadata

  • Download URL: pycudahll-0.1.0.tar.gz
  • Upload date:
  • Size: 37.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.11.0 Windows/10

File hashes

Hashes for pycudahll-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5806b9d6557a7b816f07f1750dd21d10f816c1bddefaf990cd3f5ffe0642ffd2
MD5 4d9862ac16c35ca4955b312be59c8753
BLAKE2b-256 0701d2f78fdd3ddd61fc9d0c111a8f6d091a1d04ebf50257c7fe783cd2653946

See more details on using hashes here.

File details

Details for the file pycudahll-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pycudahll-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 37.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.11.0 Windows/10

File hashes

Hashes for pycudahll-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6a367b7c5ae2071907fda59f07014baf63039b541b97fdf6a9bd22a5f9d11456
MD5 127cf33dec363f2a9282a24a345a867c
BLAKE2b-256 77b9522efd40eae19e4d183af618dcfc23dbf59b5dcd57a821b9d090b19d0aeb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page