A class for managing on-disk hash trees.
Project description
Disk Hash Tree Python Package
An implementation for storing and searching through a large set of hashes.
What's this for?
This project was originally being developed for MLC@home as a solution to storing and testing membership for a large amounts of hashes in a memory-cheap, fast and persistent data structure. It uses the optimisations of the filesystem to do all the hard work of storing and checking membership of a hash in a set.
Why make this?
Other than pickling and managing a set()
object on-disk with a custom script, I couldn't find any other Python solution to implement a quick, persistent set()
-like object that could support big data.
At the time of making this, I am studying Advanced Computer Science at Western Sydney Univeristy and was tasked with this as an extra-cirricula activity, so why not turn this into something a little bit bigger?
Getting started
This package can be run standalone or imported into any Python script.
Installing
pip install diskhashtree
Importing and quickstart
from diskhashtree import DiskHashTree
dht = DiskHashTree('./mydht/')
dht.add('aaaaaa')
dht.add('zzzzzz')
print(dht.contains('aaaaaa'))
print(dht.pop())
dht.discard('aaaaaa')
dht.discard('zzzzzz')
print(dht.is_empty())
Running standalone
DiskHashTree can be run straight from the commandline with no additional overhead compared to running it natively in Python. All the information is in the help function:
diskhashtree -h
The maths
I had no idea until I finished this project and started showing it off that I realised this package is in fact an implementation of a radix tree on a file system and had no idea radix trees existed until this point. You can check out the operations and complexity on Wikipedia.
What I am saying is that this structure is not exactly a radix tree but it is almost exactly the same.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for diskhashtree-1.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 760af943cdb3435318bb7f2e92f5dc82a6aec6491d45f0105e45a65bc43fcef6 |
|
MD5 | 297a17e02fb2410f63b8ba525aee5999 |
|
BLAKE2b-256 | a272e64fea1cecb7d1e2cc75a5591d0d90457a1569e85e068a954d06c71bdd39 |