An auto-indexed Python List data structure with constant time .index() method
Project description
HashedList
A List with O(1) time complexity for .index()
method, instead of O(N).
Description
A data structure that is pretty much a Python list, except that:
- The method
.index(value)
is O(1) (Good) - The data structure uses twice more memory due to indexing (Not good but still okay)
- Items must be unique. It will raise DuplicateValueError if duplicate item is provided
Main use case:
You have a huge list of unique items that:
- You may update the list (remove, insert, set value etc) from time to time
- You may get the index of a specific item in the list from time to time
In this case, using just a regular list definitely works but will cost you O(N) each time you get the index of a specific item. Or, you can maintain along the list a dictionary of item => index ,but that will cost you the burden of updating the dictionary everytime the list is updated.
HashedList will make the work easy for you.
Installation
pip install hashed-list
Usage
Simply instantiate it from an iterable, and use it as you normally would a Python list
from hashed_list import HashedList
# From list
hashed_list_1 = HashedList(["a", "b", "c"])
# From generator
hashed_list_2 = HashedList(x for x in range(100) if x % 2 == 0)
# From sequence
hashed_list_3 = HashedList(range(1, 100, 2))
# Exceptions
from hashed_list import DuplicateValueError
try:
hashed_list_4 = HashedList([1, 1, 2])
except DuplicateValueError as e:
print(e)
# Use it like a normal Python list
hashed_list_1[0] # => "a"
len(hashed_list_3) == len(list(range(1, 100, 2))) # => True
hashed_list_1.index("c") # => 2
# The same for .extend(), .append(), .insert(), etc ...
Simple benchmark
On a large list, HashedList.index()
should be more than a thousand times faster
than list.index()
but you can try it yourself by copy-pasting the code below on
your Python shell
import time
from random import randint
from hashed_list import HashedList
list_size = 99999999
print("Testing list.index()")
very_huge_list = list(range(list_size))
t0 = time.time()
_ = very_huge_list.index(randint(0, list_size))
d1 = time.time() - t0
print(f"list.index took {d1} seconds") # list.index took 0.0004763603210449219 seconds
print("Testing HashedList.index()")
very_huge_hashed_list = HashedList(range(list_size))
t0 = time.time()
_ = very_huge_hashed_list.index(randint(0, list_size))
d2 = time.time() - t0
print(f"HashList.index took {d2} seconds") # HashList.index took 0.0004763603210449219 seconds
print(f"HashList.index is {d1 // d2} times faster than list.index!")
Caveats
HashedList
consumes 2 times more memory than Python listHashedList
consumes more time during initialization due to hashing
Given these caveats, use HashedArray
only when you know that .index
is going to be used a lot.
Development
To start developing for and contributing to this repository:
# Start a virtual environment
python -m venv .venv
# Install the package from source
python -m install -e .
# Run all tests
python -m unittest discover tests/
# Now you can start developing. Please see src/ and tests/
# folders for source code and tests respectively
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for hashed_list-1.0.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 98fca20bfbdd8349d56fb6694a849b09e5ddaff2353acf73af864b8f8a8809b5 |
|
MD5 | 6ea22d87e8b15fe20b08d6c3ba20ff44 |
|
BLAKE2b-256 | 5803e04beff650e6f2b41f1c7137589b07aac3ce55ca03f365f2005810909bad |