Skip to main content

WordTrie: a simple trie (prefix tree) for word and phrase matching

Project description

WordTrie

Example

Create a new trie:

from wordtrie import WordTrie
trie = WordTrie()

Words and their values are added to the trie with the add() method:

trie.add("She", 1)
trie.add("sea", 2)

Exact matches in the trie are found with the match() method:

print(trie.match("She"))
# 1
print(trie.match("sells"))
# None
print(trie.match("She sells"))
# None

All matches in a stream of words can be found with the search() method:

print(trie.search("She sells sea shells by the sea shore."))
# [1, 2, 2]

Phrases can be added too and will be split into a list of words:

trie.add("sea shells", 3)
# same as trie.add(["sea", "shells"], 3)

Matching is greedy and will match the maximal length phrase:

print(trie.match("sea shells"))
# 3
print(trie.search("She sells sea shells by the sea shore."))
# [1, 3, 2]

In addition to the values, you can return the trie nodes that were matched with return_nodes=True:

print(trie.search("She sells sea shells by the sea shore.", return_nodes=True))
# [(['She'], 1), (['sea', 'shells'], 3), (['sea'], 2)]

The trie can be written to a JSON file with:

trie.to_json("sea.json")
# {
#   "She": {
#     "#": 1
#   },
#   "sea": {
#     "#": 2,
#     "shells": {
#       "#": 3
#     }
#   }
# }

Or restored from a JSON file with:

trie.from_json("sea.json")
print(trie.match("sea"))
# 2

The reserved key # is used to store the value in the JSON structure. You can still add a word that starts with # to the trie, and it will be protected with an additional prepended #:

trie.add("#She", 4)
trie.to_json("sea.json")
# {
#    "##She": {
# ...
print(trie.match("#She"))
# 4

When a node does not yet exist in the trie, the value specified in the add() method is used as the initial value. If the node already exists, then an aggregator function can be called to modify the value based on the old and new values. The default aggregator is to replace the old value with the new value. However, a custom aggregator can be defined as a function with signature aggregator(old, new) and passed to the add() call:

def sum_aggregator(old, new):
    return old + new
trie.add("She", 100, aggregator=sum_aggregator)
print(trie.search("She sells sea shells by the sea shore."))
# [101, 3, 2]

Testing

Run the example above as a basic regression test with:

# grep "^    " README.md | sed 's/    //' | python

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wordtrie-0.0.4.zip (8.8 kB view details)

Uploaded Source

Built Distribution

wordtrie-0.0.4-py3-none-any.whl (5.0 kB view details)

Uploaded Python 3

File details

Details for the file wordtrie-0.0.4.zip.

File metadata

  • Download URL: wordtrie-0.0.4.zip
  • Upload date:
  • Size: 8.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.8.10

File hashes

Hashes for wordtrie-0.0.4.zip
Algorithm Hash digest
SHA256 252538279b8a552f223234e5946be585794a62eb0fceac5d3790df2c3b39c43d
MD5 e3656b4ed451b03dd57b5e1cdab4895c
BLAKE2b-256 5ad20dfe2f2ce432d0726eec1dfafa9a3fed93859752006692ff7aa6aaf54730

See more details on using hashes here.

File details

Details for the file wordtrie-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: wordtrie-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 5.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.5

File hashes

Hashes for wordtrie-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 e2d0dbae051bcafe12f42f6e176f17ca27a3b12c23f0c3af9eb5766407f3947e
MD5 d5013cd0bf480ade190f77813212e6e5
BLAKE2b-256 48b7a7305c755b56f558d6fa7466f0742bf5f0a7be0c0fcae629fb2d14b237b7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page