This package provides tries (prefix trees) with frequencies implementation based on dictionaries.
Project description
Tries with frequencies
This Python package has functions for creation and manipulation of Tries (Prefix trees) with frequencies.
The package provides Machine Learning (ML) functionalities, not "just" a Trie data structure.
This Python implementation closely follows the Mathematica implementation [AAp2].
Installation
From GitHub:
pip install -e git+https://github.com/antononcube/Python-packages.git#egg=TriesWithFrequencies-antononcube\&subdirectory=TriesWithFrequencies
From PyPI:
python3 -m pip install TriesWithFrequencies
Setup
from TriesWithFrequencies import *
Creation examples
In this section we show a few ways to create tries with frequencies.
Consider a trie (prefix tree) created over a list of words:
tr = trie_create_by_split( ["bar", "bark", "bars", "balm", "cert", "cell"] )
trie_form(tr)
TRIEROOT => 6.0
├─b => 4.0
│ └─a => 4.0
│ ├─r => 3.0
│ │ └─k => 1.0
│ │ └─s => 1.0
│ └─l => 1.0
│ └─m => 1.0
└─c => 2.0
└─e => 2.0
├─r => 1.0
│ └─t => 1.0
└─l => 1.0
└─l => 1.0
Here we convert the trie with frequencies above into a trie with probabilities:
ptr = trie_node_probabilities( tr )
trie_form(ptr)
TRIEROOT => 1.0
├─b => 0.6666666666666666
│ └─a => 1.0
│ ├─r => 0.75
│ │ ├─k => 0.3333333333333333
│ │ └─s => 0.3333333333333333
│ └─l => 0.25
│ └─m => 1.0
└─c => 0.3333333333333333
└─e => 1.0
├─r => 0.5
│ └─t => 1.0
└─l => 0.5
└─l => 1.0
Shrinking
Here we shrink the trie with probabilities above:
trie_form(trie_shrink(ptr))
TRIEROOT => 1.0
└─ba => 1.0
└─r => 0.75
└─k => 0.3333333333333333
└─s => 0.3333333333333333
└─lm => 1.0
└─ce => 1.0
└─rt => 1.0
└─ll => 1.0
Here we shrink the frequencies trie using a separator:
trie_form(trie_shrink(tr, sep="~"))
TRIEROOT => 6.0
└─b~a => 4.0
└─r => 3.0
└─k => 1.0
└─s => 1.0
└─l~m => 1.0
└─c~e => 2.0
└─r~t => 1.0
└─l~l => 1.0
Retrieval and sub-tries
Here we retrieve a sub-trie with a key:
trie_form(trie_sub_trie(tr, list("bar")))
r => 3.0
└─k => 1.0
└─s => 1.0
Classification
Create a trie:
words = [*(["bar"] * 6), *(["bark"] * 3), *(["bare"] * 2), *(["cam"] * 3), "came", *(["camelia"] * 4)]
tr = trie_create_by_split(words)
tr = trie_node_probabilities(tr)
Show node counts:
trie_node_counts(tr)
{'total': 13, 'internal': 10, 'leaves': 3}
Show the trie form:
trie_form(tr)
TRIEROOT => 1.0
├─b => 0.5789473684210527
│ └─a => 1.0
│ └─r => 1.0
│ ├─k => 0.2727272727272727
│ └─e => 0.18181818181818182
└─c => 0.42105263157894735
└─a => 1.0
└─m => 1.0
└─e => 0.625
└─l => 0.8
└─i => 1.0
└─a => 1.0
Classify with the letters of the word "cam":
trie_classify(tr, list("cam"), prop="Probabilities")
{'a': 0.5, 'm': 0.375, 'e': 0.12499999999999997}
References
Articles
[AA1] Anton Antonov, "Tries with frequencies for data mining", (2013), MathematicaForPrediction at WordPress.
[AA2] Anton Antonov, "Removal of sub-trees in tries", (2013), MathematicaForPrediction at WordPress.
[AA3] Anton Antonov, "Tries with frequencies in Java" (2017), MathematicaForPrediction at WordPress. GitHub Markdown.
[RAC1] Tib, "Day 10: My 10 commandments for Raku performances", (2020), Raku Advent Calendar.
[WK1] Wikipedia entry, Trie.
Packages
[AAp1] Anton Antonov, Tries with frequencies Mathematica Version 9.0 package, (2013), MathematicaForPrediction at GitHub.
[AAp2] Anton Antonov, Tries with frequencies Mathematica package, (2013-2018), MathematicaForPrediction at GitHub.
[AAp3] Anton Antonov, Tries with frequencies in Java, (2017), MathematicaForPrediction at GitHub.
[AAp4] Anton Antonov, Java tries with frequencies Mathematica package, (2017), MathematicaForPrediction at GitHub.
[AAp5] Anton Antonov, Java tries with frequencies Mathematica unit tests, (2017), MathematicaForPrediction at GitHub.
[AAp6] Anton Antonov, ML::TriesWithFrequencies Raku package, (2021), GitHub/antononcube.
Videos
[AAv1] Anton Antonov, "Prefix Trees with Frequencies for Data Analysis and Machine Learning", (2017), Wolfram Technology Conference 2017, Wolfram channel at YouTube.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file TriesWithFrequencies-0.0.1.tar.gz
.
File metadata
- Download URL: TriesWithFrequencies-0.0.1.tar.gz
- Upload date:
- Size: 10.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cbd5d02f15e4d8f5de8587fe0a545189c5a9f14a1746209b1489aa423775821e |
|
MD5 | 86388eb1889437c03037b87994e62f09 |
|
BLAKE2b-256 | 537987af6abf1589969621ae0bb59c75b533e4cdb5258e8e38ef52be490c5252 |
File details
Details for the file TriesWithFrequencies-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: TriesWithFrequencies-0.0.1-py3-none-any.whl
- Upload date:
- Size: 8.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 16c2afb5eabb1cd11b0b86441bd5f2b05b9fe66d40fece74bc3085fb145dee52 |
|
MD5 | bbe3caaae5364eb044a1fdb6edefb7f7 |
|
BLAKE2b-256 | 0eda8db30718ef9bf75c7378c78c38a761a73fc5faea3095d308ea9e7fd7dab4 |