Skip to main content

Radix trie for fast prefix search.

Project description

PyPruningRadixTrie

GitHub CI PyPI version

Python Port of Pruning Radix Trie by Wolf Garbe.

Changes compared to original version

  • Removed parameter to disable pruning behavior.
    • See test/non_pruning_radix_trie.py for a non-pruning version that you can use to see the speed improvement.
  • Added outline for more generic InputProvider and providers that read CSV or JSON as examples

What and Why

A Trie is a tree data structure that is commonly used for searching terms that start with a given prefix.
It starts with an empty string at the base of the trie, the root node.
Adding a new entry to the trie creates a new branch. This branch shares already present characters with existing nodes and creates new nodes when it's prefix diverges from the present entries.

# trie containing flower & flowchart (1 char = 1 node)

'' - f - l - o - w - e - r
                 |
                 c - h - a - r - t

A RadixTrie is the space optimized version of a Trie.
It combines the nodes with only one sub-node into one, containing more than one character.

# radix trie containing flower & flowchart

'' - flow - er
      |
     chart

The prefix Pruning references the algorithm to query the RadixTrie.
In order for the pruning to work, the nodes in RadixTrie are stored in a sorted manner.
This structure allows cutting off unpromising branches during querying the trie which makes the algorithm way faster compared to a non-pruning trie.

Usage

Get from PyPI:

pip install pypruningradixtrie

Create the PRT:

# empty trie
trie = PruningRadixTrie()

# fill with data from CSV file on creation
trie = PruningRadixTrie('./test_data.csv', CSVInputProvider(',', lambda x: float(x[1])))

Add entries:
CSV:

# fill with data from CSV file, score is at position 1, term at position 0
fill_trie_from_file(trie, './test_data.csv', CSVInputProvider(',', lambda x: float(x[1]), 0))

JSON:

# define a functon to calculate the score out of a JSON entry
def score_fun(json_entry: Dict[str, Any]) -> float:
  return json_entry["pages"] * json_entry["year"] / 10.0

# "title" = key for term to insert into PRT
fill_trie_from_file(trie, './test_data.json', JSONInputProvider("title", score_fun))

Single Entry:

# insert single entry
insert_term(trie, term="flower", score=20)

Use the PRT:

# get the top 10 entries that start with 'flower'
trie.get_top_k_for_prefix('flower', 10)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pypruningradixtrie-2.1.0.tar.gz (13.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pypruningradixtrie-2.1.0-py3-none-any.whl (10.3 kB view details)

Uploaded Python 3

File details

Details for the file pypruningradixtrie-2.1.0.tar.gz.

File metadata

  • Download URL: pypruningradixtrie-2.1.0.tar.gz
  • Upload date:
  • Size: 13.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.15

File hashes

Hashes for pypruningradixtrie-2.1.0.tar.gz
Algorithm Hash digest
SHA256 8e76907a7e7af0ada9b7c813e4c84836c4caa2e254f7dbcb1922da1a433f5318
MD5 b4bd667bdd4debf6132c491fc2a91a68
BLAKE2b-256 792679ed1d963b7da545fd275b9dbace6e00c0916c0a9cba1dea37c845c58af1

See more details on using hashes here.

File details

Details for the file pypruningradixtrie-2.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pypruningradixtrie-2.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4167122d0c780c4f68000c56385ec38d7445f3f3f7d3f473e315abdb032a3ebe
MD5 270c360cf5059bab88a86c316616a402
BLAKE2b-256 122cc3b4159dd1cf6fcb1aa02b0291a34fa64a1ae556dccc49ada085fb589bf6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page