Skip to main content

Compact, read-only nested dictionary backed by succinct data structures

Project description

compact-tree

Tests License: MIT

Compact, read-only nested dictionary backed by succinct data structures.

CompactTree stores a nested Python dict using a LOUDS-encoded trie with DAWG-style key/value deduplication, enabling low-memory random access and efficient serialization.

Features

  • Memory-efficient: Uses succinct data structures (LOUDS trie + DAWG-style deduplication)
  • Fast lookups: O(1) rank/select operations via Poppy bit vectors
  • Serializable: Save and load from disk with efficient binary format
  • Read-only: Optimized for lookup-heavy workloads
  • Storage-agnostic: Works with local files and remote storage via fsspec

Installation

pip install compact-tree

Or install from source:

git clone https://github.com/andrey-savov/compact-tree.git
cd compact-tree
pip install -e .

Quick Start

from compact_tree import CompactTree

# Build from a nested dict
tree = CompactTree.from_dict({
    "a": {
        "x": "1",
        "y": "2"
    },
    "b": "3"
})

# Access like a normal dict
print(tree["a"]["x"])   # "1"
print(tree["b"])        # "3"
print("a" in tree)      # True

# Serialize to file
tree.serialize("tree.ctree")

# Load from file
loaded_tree = CompactTree("tree.ctree")

# Convert back to plain dict
plain_dict = loaded_tree.to_dict()

How It Works

LOUDS (Level-Order Unary Degree Sequence)

LOUDS is a succinct tree representation that encodes an ordered tree into a single bit string using roughly 2n bits for n nodes (close to the information-theoretic minimum).

Encoding rule: Traverse the tree in breadth-first (level) order. For each node, write d 1-bits (where d is the node's number of children) followed by a single 0-bit.

Example for a root with children A (2 kids) and B (0 kids):

root  ->  1 1 0      (2 children, then 0-terminator)
A     ->  1 1 0      (2 children)
B     ->  0          (leaf)

Navigation relies on rank and select queries:

Operation Description
first_child(v) Find the (v-1)-th 0, check next position
next_sibling(v) Find the 1-bit for node v, check next position

DAWG-Style Deduplication

  • Keys are collected, sorted, and deduplicated into a global vocabulary
  • Values (leaves) are similarly deduplicated into a value table
  • Edge labels store integer IDs rather than raw strings
  • Same key/value appearing multiple times is stored only once

Architecture

CompactTree
  |
  +-- louds    : _LOUDS        bit-vector tree topology (Poppy rank/select)
  +-- elbl     : memoryview    edge labels  (uint32 key ids, 4 bytes per node)
  +-- vcol     : memoryview    value column (uint32: value id or 0xFFFFFFFF for internal nodes)
  +-- _keys_buf: memoryview    length-prefixed UTF-8 key strings
  +-- val      : memoryview    length-prefixed UTF-8 value strings

Binary Format (v2)

Magic   : 5 bytes   "CTree"
Version : 8 bytes   uint64 LE (always 2)
Header  : 5 x 8 bytes  lengths of: keys, values, louds, vcol, elbl
Payload : keys_bytes | val_bytes | louds_bytes | vcol_bytes | elbl_bytes

Dependencies

  • bitarray — Mutable bit arrays
  • succinct (Poppy) — Rank/select in O(1)
  • fsspec — Filesystem abstraction for local and remote storage

Testing

pytest test_compact_tree.py

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for details.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Based on LOUDS (Level-Order Unary Degree Sequence) tree representation
  • Uses Poppy rank/select implementation from the succinct library
  • Inspired by DAWG (Directed Acyclic Word Graph) compression techniques

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

savov_compact_tree-0.2.0.tar.gz (13.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

savov_compact_tree-0.2.0-py3-none-any.whl (7.9 kB view details)

Uploaded Python 3

File details

Details for the file savov_compact_tree-0.2.0.tar.gz.

File metadata

  • Download URL: savov_compact_tree-0.2.0.tar.gz
  • Upload date:
  • Size: 13.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for savov_compact_tree-0.2.0.tar.gz
Algorithm Hash digest
SHA256 3e5afca50842f7061b62de437c1f74f0a78f0d000e02dcce160c0502f0cd48a0
MD5 7e521d89f8a77ba7c4a16a0eb3d77294
BLAKE2b-256 ec42b2eaf487e532a0ae139d08dcf0071ee25afc688f37491d74f0cccdeeef17

See more details on using hashes here.

Provenance

The following attestation bundles were made for savov_compact_tree-0.2.0.tar.gz:

Publisher: publish.yml on andrey-savov/compact-tree

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file savov_compact_tree-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for savov_compact_tree-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dd2ca0547085845208ebc88ee617e2755bc6541c23cc9686e350c9a26794cc7c
MD5 926f5c08ea1259a7d4894f0fa9938de8
BLAKE2b-256 301bae7eb47eef7233a0b6e73ee52cec18debc11c1cd3114db74acd6e841e82f

See more details on using hashes here.

Provenance

The following attestation bundles were made for savov_compact_tree-0.2.0-py3-none-any.whl:

Publisher: publish.yml on andrey-savov/compact-tree

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page