Skip to main content

An auto-indexed Python List data structure with constant time .index() method

Project description

PyPI version Downloads

HashedList

A List with O(1) time complexity for .index() method, instead of O(N).

Description

HashedList is a data structure that is pretty much a Python list, except that:

  1. The method .index(value) is O(1) (Good)
  2. It uses twice more memory due to indexing (Not good but still okay)
  3. It takes more time than list during initialization due to hashing of each item
  4. Items must be unique. It will raise DuplicateValueError if duplicate item is provided

Main use case:

You have a huge list of unique, ordered items that:

  1. You may update the list (remove, insert, set value etc) from time to time
  2. You may get the index of a specific item in the list very often

In this case, using just a regular list definitely works but will cost you O(N) each time you get the index of a specific item. Or, you can maintain along the list a dictionary of item => index, but that will cost you the burden of updating the dictionary everytime the list is updated.

HashedList will make the work easy for you.

Installation

pip install hashed-list

Usage

Simply instantiate it from an iterable, and use it as you normally would a Python list

from hashed_list import HashedList

# From list
hashed_list_1 = HashedList(["a", "b", "c"])
# From generator
hashed_list_2 = HashedList(x for x in range(100) if x % 2 == 0)
# From sequence
hashed_list_3 = HashedList(range(1, 100, 2))

# Exceptions
from hashed_list import DuplicateValueError
try:
    hashed_list_4 = HashedList([1, 1, 2])
except DuplicateValueError as e:
    print(e)

# Use it like a normal Python list
hashed_list_1[0]  # => "a"
len(hashed_list_3) == len(list(range(1, 100, 2)))  # => True
hashed_list_1.index("c")  # => 2
# The same for .extend(), .append(), .insert(), etc ...

Simple benchmark

On a large list, HashedList.index() should be tens of times faster than list.index() but you can try it yourself by copy-pasting the code below on your Python shell

from random import randint
import time

from hashed_list import HashedList

list_size = 999_999
random_values = [randint(list_size // 2, list_size) for _ in range(1000)]

print("Testing list.index()")
t0 = time.time()
very_huge_list = list(range(list_size))
[very_huge_list.index(random_value) for random_value in random_values]
d1 = time.time() - t0
del very_huge_list  # Clear up unused memory
print(f"list.index() took {d1} seconds for {len(random_values)} calls")
# => list.index took 7.381884813308716 seconds for 1000 calls

print("Testing HashedList.index()")
t0 = time.time()
very_huge_hashed_list = HashedList(range(list_size))
# _ = very_huge_hashed_list.index(random_value)
[very_huge_hashed_list.index(random_value) for random_value in random_values]
d2 = time.time() - t0
del very_huge_hashed_list  # Clear up unused memory
print(f"HashList.index() took {d2} seconds for {len(random_values)} calls")
# HashList.index took 0.17798161506652832 seconds for 1000 calls

# Result
print(f"HashList.index() is {d1 // d2} times faster than list.index()!")
# => HashList.index() is 41.0 times faster than list.index()!

Caveats

  1. HashedList consumes 2 times more memory than Python list
  2. HashedList consumes more time during initialization due to hashing

Given these caveats, use HashedArray only when you know that .index is going to be used a lot.

Development

To start developing for and contributing to this repository:

# Start a virtual environment
python -m venv .venv
# Install the package from source
python -m install -e .
# Run all tests
python -m unittest discover tests/
# Now you can start developing. Please see src/ and tests/ 
# folders for source code and tests respectively

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hashed-list-1.0.5.tar.gz (4.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hashed_list-1.0.5-py3-none-any.whl (5.6 kB view details)

Uploaded Python 3

File details

Details for the file hashed-list-1.0.5.tar.gz.

File metadata

  • Download URL: hashed-list-1.0.5.tar.gz
  • Upload date:
  • Size: 4.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for hashed-list-1.0.5.tar.gz
Algorithm Hash digest
SHA256 9e0538a5832e3eba0700e888427883999a9156e62a0b427551dd6ae921b433d7
MD5 d96c85dd18e33c3cd6557fb3df05b3ee
BLAKE2b-256 3f3ceb2dbc18000eb949c19d6463d098307812cfe56e26e9fa0b2eb4cee1b60e

See more details on using hashes here.

File details

Details for the file hashed_list-1.0.5-py3-none-any.whl.

File metadata

  • Download URL: hashed_list-1.0.5-py3-none-any.whl
  • Upload date:
  • Size: 5.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for hashed_list-1.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 da87a02bb1ea5ce96c5417e3ab1b5464cbb17854342be4969eda1c348c7647e0
MD5 a5a9893fd61f5ba3515fa1c6ca0f35a4
BLAKE2b-256 4bcc44468717f8a03f60163c229899e6e9fc0858a7bd50f4e99b23f15e416ccd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page