Skip to main content

A trie structure supporting fuzzy string searches.

Project description

vtrie

Trie structure supporting approximate string matching (substitutions only) for Python (2.x and 3.x).

Installation

pip install vtrie

Features

  • It is similar to a dict() in general usage, and supports much of the dict() interface:

    • __len__(): number of items in the trie

    • __contains()__

    • __getitem()__

    • __setitem()__

    • __delitem()__

    • __sizeof()__: size of the trie in bytes

    • __repr()__: dict-like output, showing contents of the trie

    • has_key()

    • setdefault()

    • get()

    • pop()

    • popitem()

    • keys()

    • values()

    • items()

    • iterkeys()

    • itervalues()

    • iteritems()

  • Trie-specific methods:

    • num_nodes(): number of nodes

    • longest_prefix(k): find longest key matching the beginning of k, returning (key, value) pair as a 2-tuple. None is returned if no match.

    • suffixes(k): iterate over all (suffix, value) pairs as 2-tuples, that have k as a prefix.

    • neighbors(key = k, maxhd = n): iterate over all (Hamming distance, key, value) triples, as 3 tuples, where key and k differ by at least 1, but maximally n characters. Note, one can only search for neighbors of existing keys.

    • pairs(keylen = l, maxhd = n): iterate over ALL (Hamming distance, key1, value1, key2, value2) 5-tuples where key1 and key2 differ by at least 1, but maximally n characters. Note, pairs() returns a dirty iterator, meaning that nodes in the trie are modified while the iterator is running. An exception will be thrown when iterating with more than one dirty iterator.

  • Pickling

Usage

Create a trie:

>>> from vtrie import Trie
>>> t = Trie()

Add strings to the trie. Currently, only ascii strings are supported:

>>> t[b"Hello"] = 123
>>> t[b"World"] = {"my": "dict"}
>>> t[b"foo"] = None

Check if “Hello” is in the trie:

>>> b"Hello" in t
True

Show all inserted strings sharing the same prefix:

>>> t[b"foo"] = 0
>>> t[b"foobar"] = 1
>>> t[b"fooo"] = 2
>>> t[b"hello"] = 3
>>> list(t.suffixes(b"fo"))
[('o', 0), ('obar', 1), ('oo', 2)]

Search for keys that differ by less than a given number of substitutions from the provided key. The results are tuples with first the Hamming distance between the given key and the found key, then the found key and its value:

>>> t[b"hello world"] = 0
>>> t[b"*ello world"] = "a"
>>> t[b"*ell* world"] = "b"
>>> t[b"*ell* w*rld"] = "c"
>>> t[b"hell* w*rl*"] = "d"
>>> list(t.neighbors(b"hello world", maxhd = 1))
[(1, '*ello world', 'a')]
>>> list(t.neighbors(b"hello world", maxhd = 2))
[(1, '*ello world', 'a'), (2, '*ell* world', 'b')]
>>> print("\n".join(map(str,list(t.neighbors(b"hello world", 3)))))
(3, 'hell* w*rl*', 'd')
(1, '*ello world', 'a')
(2, '*ell* world', 'b')
(3, '*ell* w*rld', 'c')

Search for all keys of a certain length that are within a certain Hamming of each other. The results are tuples with first the Hamming distance between the found keys, then the first key and its value, and then the second key and its value:

>>> print("\n".join(map(str,list(t.pairs(keylen = 11, maxhd = 2)))))
(1, 'hello world', 0, '*ello world', 'a')
(2, 'hello world', 0, '*ell* world', 'b')
(2, 'hell* w*rl*', 'd', '*ell* w*rld', 'c')
(1, '*ello world', 'a', '*ell* world', 'b')
(2, '*ello world', 'a', '*ell* w*rld', 'c')
(1, '*ell* world', 'b', '*ell* w*rld', 'c')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vtrie-0.0.2.tar.gz (34.3 kB view details)

Uploaded Source

File details

Details for the file vtrie-0.0.2.tar.gz.

File metadata

  • Download URL: vtrie-0.0.2.tar.gz
  • Upload date:
  • Size: 34.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for vtrie-0.0.2.tar.gz
Algorithm Hash digest
SHA256 f0346f597d042763339111029c30a390bc5a6f4d451ee019ae98386d2ae8717f
MD5 8b6ac4035d4434f0f5725bf0fd24ac2c
BLAKE2b-256 fa313910e8f74dea24aff0df840f3d4e1aee96eb186a12ae377a089be07fede3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page