Skip to main content

A simple binary trie implementation

Project description

Binary Trie

Python binary trie implementation, helping with XOR distances.

Author: Guillaume Michel

Purpose

IPFS and libp2p are built upon the Kademlia DHT, which uses the XOR distance as a distance metric between keys. As explained in this blogpost, the XOR distance is tricky to understand and represent. This distance metric is odd for it is non linear. For instance, $2 \oplus 3 = 1$ but $3 \oplus 4 = 7$. Ordering $N_8$ by XOR distance to 2 gives the following result: $[2,3,0,1,6,7,4,5]$

One popular representation for XOR distance is the Binary Trie. A binary brie is a simple Trie with keyspace $\lbrace0,1\rbrace^n$ with $n$ being the size of the keyspace. The perfect visual representation of the XOR distance would be a N-dimensional binary trie. The example below displays a binary trie containing the keys $[2,3,4,6,7,9,11,13]$.

Alt text

Each node in the Trie can be associated with some metadata for convenience.

Usage

Install

pip install binary-trie

Import

from binary_trie import Trie, bytes_to_bitstring, bitstring_to_bytes, int_to_bitstring, bitstring_to_int

Creating an empty trie

trie = Trie()

Adding keys

The add(key) method takes a bitstring as input. The bitstring can either be provided directly, or be computer from an int or bytes using the int_to_bitstring() and bytes_to_bitstring() functions. Note that the l parameter represent the bit length of the binary representation. It is important that all keys share the same bit length. The bit length can be omitted in bytes_to_bistring() when working with bit lengths multiple of 8.

trie.add("0010")
trie.add(4*"0")
trie.add(int_to_bitstring(3, l=4))
trie.add(bytes_to_bitstring(b'\x0e', l=4))

The add function returns True on success, and False if the key was already in the trie (not inserted).

Finding keys

The find(key) method returns True if the provided key is in the Trie, False otherwise.

trie.contains("0010") # True
trie.contains("0100") # False

Key depth

The depth(key) method returns the depth of the provided key, if it is in the Trie and -1 otherwise. The depth of a trie node is defined as the number of its direct ancestors, up to the root of the trie.

trie.depth("0")    # 1
trie.depth("0010") # 3
trie.depth("0111") # 4
trie.depth("1101") # 2
trie.depth("11")   # -1

Finding the closest keys to a target

The n_closest_keys(key, n) method returns the n closest keys to the provided key in the trie, according to the XOR distance. The keys are sorted according to the distance to the target key. Note that only leaves of the trie will be returned, not intermediary nodes.

trie.n_closest_keys("0001", 1) # ["0000"]
trie.n_closest_keys("0010", 3) # ["0010", "0011", "0000"]

Prefix matching

The prefix_match_keys(prefix) will return the list of keys of all leaves of the Trie matching the provided prefix.

trie.prefix_match_keys("00")   # ["0000", "0010", "0011"]
trie.prefix_match_keys("1111") # []

Attaching metadata to Trie nodes

It is possible to attach an object as metadata to a trie leaf node.

class MyObj(object):
    def __init__(self, key, name):
        self.key = key
        self.name = name

trie = Trie(MyObj)

obj = MyObj(int_to_bitstring(10, 4), "Node 10")
trie.add(obj.key, obj)

trie.find(obj.key).name        # "Node 10"
trie.n_closest(obj.key, 1)     # [obj]
trie.prefix_match(obj.key[:2]) # [obj]

Note that the find(key) method is similar to the contains(key) method, but returns the associated metadata if any, instead of returning a bool. The n_closest(key, n) method is similar to the n_closest_keys(key, n) method, but returns the list of metadata associated with the closest keys, instead of the list of keys.

Predicates

It is possible to assign some boolean variables to metadata objects, and make use of them using predicate in n_closest(), n_closest_keys(), prefix_match() and prefix_match_keys() methods to filter the results.

class MyObj(object):
    def __init__(self, key, name, some_bool):
        self.key = key
        self.name = name
        self.some_bool = some_bool

trie = Trie(MyObj)

nodeIDs = [0, 1, 2] # ["0000", "0001", "0010"]
for i in nodeIDs:
    obj = MyObj(int_to_bitstring(i, 4), "Node "+str(i), i % 2 == 0)
    trie.add(obj.key, obj)

trie.n_closest_keys("0001", 2, predicate=lambda n: n.some_bool) # ["0000", "0010"] 
trie.prefix_match_keys("000", predicate=lambda n: n.some_bool)  # ["0000"]

# Note that the key "0001" matched both requests, but wasn't taken into
# consideration as it doesn't satisfy the predicate

Helpers

There are 4 helpers functions to facilitate the use of this implementation with keys being not only bitstring, but also bytes or int. These helper functions help translate bytes and int to bitstring and reciprocally.

def bytes_to_bitstring(data: bytes, l: int=8*len(data)) -> bytes:
    ...
bytes_to_bitstring(b'\xff\x00') # "1111111100000000"
bytes_to_bitstring(b'\xf3',l=4) # "0011"

def bitstring_to_bytes(bs: str) -> bytes:
    ...
bitstring_to_bytes("1111111100000000") # b'\xff\x00'
bitstring_to_bytes("0011")             # b'\x03'

def int_to_bitstring(i, l: int) -> bytes:
    ...
int_to_bitstring(6, 4)  # "0110"
int_to_bitstring(6, 16) # "0000000000000110"

def bitstring_to_int(bs: str) -> int:
    ...
bitstring_to_int("0110")             # 6
bitstring_to_int("0000000000000110") # 6

Encoding

The following functions help uniquely encode a specific binary prefix, that are bistrings of arbitrary size. The bitstrings can be encoded in int or in unsigned varint. The mapping goes as follow:

bitstring       code        varint
""          ->  0       ->  00000000 (0x00)
"0"         ->  1       ->  00000001 (0x01)
"1"         ->  2       ->  00000010 (0x02)
"00"        ->  3       ->  00000011 (0x03)
...
"0000000"   ->  127     ->  01111111 (0x7f)
"0000001"   ->  128     ->  10000000 00000001 (0x8001)
...
def encode_bitstring(bitstring: str) -> int:
    ...
encode_bitstring("1")       # 2
encode_bitstring("010")     # 9
encode_bitstring("0000001") # 128

def decode_bitstring(code: int) -> str:
    ...
decode_bitstring(2)   # "1"
decode_bitstring(9)   # "010"
decode_bitstring(128) # "0000001"

def bitstring_to_varint(bitstring: str) -> bytes:
    ...
bitstring_to_varint("1")       # b'\x02'
bitstring_to_varint("010")     # b'\x09'
bitstring_to_varint("0000001") # b'\x80\x01'

def varint_to_bitstring(varint_bytes: bytes) -> str:
    ...
varint_to_bitstring(b'\x02')     # "1"
varint_to_bitstring(b'\x09')     # "010"
varint_to_bitstring(b'\x80\x01') # "0000001"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

binary_trie-1.0.3.tar.gz (12.6 kB view details)

Uploaded Source

Built Distribution

binary_trie-1.0.3-py3-none-any.whl (11.1 kB view details)

Uploaded Python 3

File details

Details for the file binary_trie-1.0.3.tar.gz.

File metadata

  • Download URL: binary_trie-1.0.3.tar.gz
  • Upload date:
  • Size: 12.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.1.dev0+g94f810c.d20240510 CPython/3.12.4

File hashes

Hashes for binary_trie-1.0.3.tar.gz
Algorithm Hash digest
SHA256 6d364e0043d3a3fd8af0daf86097fd2f4c4120869b1ebc2d3bcec490003f86df
MD5 323b386bc3709c95af7d058f2e931455
BLAKE2b-256 311e7fedd4ac22bd5430d6fd6d595f313ab00b0d05845807c8f278d99bd1adc0

See more details on using hashes here.

File details

Details for the file binary_trie-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: binary_trie-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 11.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.1.dev0+g94f810c.d20240510 CPython/3.12.4

File hashes

Hashes for binary_trie-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 a263ef7821642b78cb9853ad57ca4e7b50c5614af4783eddea786c338be11334
MD5 8a63d6f1a0a864c1e8f910b488d1b6f6
BLAKE2b-256 76a596d072d0a0bb15f890d36afbe5442ac3e0102ece07b8e9e5980c22091cbc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page