Skip to main content

Hashed Trie data structure for faster memory efficient and fast keyword indexing and lookup

Project description

Hash trie

Introduction

Hash trie is an enhanced Trie data structure stores keyword indexes for faster keyword lookup. The storage is done in such a way that it does efficient memory usage and faster lookup.

Usage

Importing and instantiating

from pyhtrie.htrie import Trie
trie = Trie()

Indexing

Syntax : trie.index_text(data, keyword_to_index)

data : Data can be any data to index keyword_to_index : Keyword associated with the data indexed

trie.index_text('car','car')
trie.index_text('cat','cat')
trie.index_text('cast','cast')
trie.index_text('Batman', 'batman')
trie.index_text('Bash', 'bash')

Prefix match

Syntax : trie.prefix_match(prefix_text_to_search)

The function returns all data matching the prefix text

prefix_text_to_search : Will be the prefix text on which the search will be performed. If there are no nodes matching the route of the prefix text then it will return an empty array

trie.prefix_match('ca')

Pattern match

Syntax : trie.pattern_match(pattern_text_to_search)

This function returns all data associated with the text pattern. When compared to the prefix_match function, this function need not necessarily require the pattern_text_to_search to match with the prefix of the keyword to be searched for. The pattern_text_to_search can appear anywhere within the keyword

pattern_text_to_search : Is the text pattern to be searched.

trie.pattern_match('at')

Storage details

The keywords to be indexed are stored in a Trie data structure as can be seen in the image below. However for faster indexing of pattern match within keywords, each node in the trie is added to the hash-map with the letter corresponding to the node being the key.

Trie

So if you need to search for any words beginning with 'ca', then prefix_match('ca') will return [cat, car, cast] as in a regular trie. However, if you want to search for any words with 'at' in them, you can use the function pattern_match('as') that will return [cat, batman]. The prefix match is done with the help of the hash-map. When a prefix match of 'at' is done, the hash-map is checked for the letter 'a' in 'at'. This returns a list of three a nodes [a,a,a] each pointing to a correspinding 'a' node in the tries as shown in the figure above. Then each of them are traversed to see which of them have the next node as 't' of 'at'. The result will be the 't' of 'cat' and the 't' preceding 'm' in 'batman'. Then the algorithm performs a collection operation from each of these 't's returned to retrieve keywords stored under those t's. The result, as can be inferred from the figure above, will be [cat, batman]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyhtrie-1.0.0.tar.gz (2.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyhtrie-1.0.0-py2-none-any.whl (4.3 kB view details)

Uploaded Python 2

File details

Details for the file pyhtrie-1.0.0.tar.gz.

File metadata

  • Download URL: pyhtrie-1.0.0.tar.gz
  • Upload date:
  • Size: 2.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.8.0 tqdm/4.30.0 CPython/2.7.15

File hashes

Hashes for pyhtrie-1.0.0.tar.gz
Algorithm Hash digest
SHA256 b848df369f3a8be380b478ccd9c535bb22ada67a084ce6b704b3cd45fbc152a9
MD5 3e4ffa23fd25217b211b19323c3f1af7
BLAKE2b-256 05d73343fd865ca866b9fc5195c84bbd92e41070a2af4dad5efea392a091896e

See more details on using hashes here.

File details

Details for the file pyhtrie-1.0.0-py2-none-any.whl.

File metadata

  • Download URL: pyhtrie-1.0.0-py2-none-any.whl
  • Upload date:
  • Size: 4.3 kB
  • Tags: Python 2
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.8.0 tqdm/4.30.0 CPython/2.7.15

File hashes

Hashes for pyhtrie-1.0.0-py2-none-any.whl
Algorithm Hash digest
SHA256 490b0b634d978b408e088db90af23cda8af3af03aae8d84eb4060791f8e4f248
MD5 1ac115ceec062eace6e8e8e28934047b
BLAKE2b-256 cfd01b085697e7f5bbc81908b2a9ed6326f06ab5f1c0391f9742dcc0aeba7735

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page