Skip to main content

This package provides tries (prefix trees) with frequencies implementation based on dictionaries.

Project description

Tries with frequencies

This Python package has functions for creation and manipulation of Tries (Prefix trees) with frequencies.

The package provides Machine Learning (ML) functionalities, not "just" a Trie data structure.

This Python implementation closely follows the Mathematica implementation [AAp2].


Installation

From GitHub:

pip install -e git+https://github.com/antononcube/Python-packages.git#egg=TriesWithFrequencies-antononcube\&subdirectory=TriesWithFrequencies

From PyPI:

python3 -m pip install TriesWithFrequencies

Setup

from TriesWithFrequencies import *

Creation examples

In this section we show a few ways to create tries with frequencies.

Consider a trie (prefix tree) created over a list of words:

tr = trie_create_by_split( ["bar", "bark", "bars", "balm", "cert", "cell"] )
trie_form(tr)
TRIEROOT => 6.0
├─b => 4.0
│ └─a => 4.0
│   ├─r => 3.0
│   │ └─k => 1.0
│   │ └─s => 1.0
│   └─l => 1.0
│     └─m => 1.0
└─c => 2.0
  └─e => 2.0
    ├─r => 1.0
    │ └─t => 1.0
    └─l => 1.0
      └─l => 1.0

Here we convert the trie with frequencies above into a trie with probabilities:

ptr = trie_node_probabilities( tr )
trie_form(ptr)
TRIEROOT => 1.0
├─b => 0.6666666666666666
│ └─a => 1.0
│   ├─r => 0.75
│   │ ├─k => 0.3333333333333333
│   │ └─s => 0.3333333333333333
│   └─l => 0.25
│     └─m => 1.0
└─c => 0.3333333333333333
  └─e => 1.0
    ├─r => 0.5
    │ └─t => 1.0
    └─l => 0.5
      └─l => 1.0

Shrinking

Here we shrink the trie with probabilities above:

trie_form(trie_shrink(ptr))
TRIEROOT => 1.0
└─ba => 1.0
  └─r => 0.75
    └─k => 0.3333333333333333
    └─s => 0.3333333333333333
  └─lm => 1.0
└─ce => 1.0
  └─rt => 1.0
  └─ll => 1.0

Here we shrink the frequencies trie using a separator:

trie_form(trie_shrink(tr, sep="~"))
TRIEROOT => 6.0
└─b~a => 4.0
  └─r => 3.0
    └─k => 1.0
    └─s => 1.0
  └─l~m => 1.0
└─c~e => 2.0
  └─r~t => 1.0
  └─l~l => 1.0

Retrieval and sub-tries

Here we retrieve a sub-trie with a key:

trie_form(trie_sub_trie(tr, list("bar")))
r => 3.0
└─k => 1.0
└─s => 1.0

Classification

Create a trie:

words = [*(["bar"] * 6), *(["bark"] * 3), *(["bare"] * 2), *(["cam"] * 3), "came", *(["camelia"] * 4)]
tr = trie_create_by_split(words)
tr = trie_node_probabilities(tr)

Show node counts:

trie_node_counts(tr)
{'total': 13, 'internal': 10, 'leaves': 3}

Show the trie form:

trie_form(tr)
TRIEROOT => 1.0
├─b => 0.5789473684210527
│ └─a => 1.0
│   └─r => 1.0
│     ├─k => 0.2727272727272727
│     └─e => 0.18181818181818182
└─c => 0.42105263157894735
  └─a => 1.0
    └─m => 1.0
      └─e => 0.625
        └─l => 0.8
          └─i => 1.0
            └─a => 1.0

Classify with the letters of the word "cam":

trie_classify(tr, list("cam"), prop="Probabilities")
{'a': 0.5, 'm': 0.375, 'e': 0.12499999999999997}

References

Articles

[AA1] Anton Antonov, "Tries with frequencies for data mining", (2013), MathematicaForPrediction at WordPress.

[AA2] Anton Antonov, "Removal of sub-trees in tries", (2013), MathematicaForPrediction at WordPress.

[AA3] Anton Antonov, "Tries with frequencies in Java" (2017), MathematicaForPrediction at WordPress. GitHub Markdown.

[RAC1] Tib, "Day 10: My 10 commandments for Raku performances", (2020), Raku Advent Calendar.

[WK1] Wikipedia entry, Trie.

Packages

[AAp1] Anton Antonov, Tries with frequencies Mathematica Version 9.0 package, (2013), MathematicaForPrediction at GitHub.

[AAp2] Anton Antonov, Tries with frequencies Mathematica package, (2013-2018), MathematicaForPrediction at GitHub.

[AAp3] Anton Antonov, Tries with frequencies in Java, (2017), MathematicaForPrediction at GitHub.

[AAp4] Anton Antonov, Java tries with frequencies Mathematica package, (2017), MathematicaForPrediction at GitHub.

[AAp5] Anton Antonov, Java tries with frequencies Mathematica unit tests, (2017), MathematicaForPrediction at GitHub.

[AAp6] Anton Antonov, ML::TriesWithFrequencies Raku package, (2021), GitHub/antononcube.

Videos

[AAv1] Anton Antonov, "Prefix Trees with Frequencies for Data Analysis and Machine Learning", (2017), Wolfram Technology Conference 2017, Wolfram channel at YouTube.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

TriesWithFrequencies-0.0.1.tar.gz (10.2 kB view details)

Uploaded Source

Built Distribution

TriesWithFrequencies-0.0.1-py3-none-any.whl (8.7 kB view details)

Uploaded Python 3

File details

Details for the file TriesWithFrequencies-0.0.1.tar.gz.

File metadata

  • Download URL: TriesWithFrequencies-0.0.1.tar.gz
  • Upload date:
  • Size: 10.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.5

File hashes

Hashes for TriesWithFrequencies-0.0.1.tar.gz
Algorithm Hash digest
SHA256 cbd5d02f15e4d8f5de8587fe0a545189c5a9f14a1746209b1489aa423775821e
MD5 86388eb1889437c03037b87994e62f09
BLAKE2b-256 537987af6abf1589969621ae0bb59c75b533e4cdb5258e8e38ef52be490c5252

See more details on using hashes here.

File details

Details for the file TriesWithFrequencies-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for TriesWithFrequencies-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 16c2afb5eabb1cd11b0b86441bd5f2b05b9fe66d40fece74bc3085fb145dee52
MD5 bbe3caaae5364eb044a1fdb6edefb7f7
BLAKE2b-256 0eda8db30718ef9bf75c7378c78c38a761a73fc5faea3095d308ea9e7fd7dab4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page