Compact, read-only nested dictionary backed by succinct data structures
Project description
compact-tree
Compact, read-only nested dictionary backed by succinct data structures.
CompactTree stores a nested Python dict using a LOUDS-encoded trie with DAWG-style key/value deduplication, enabling low-memory random access and efficient serialization.
Features
- Memory-efficient: Uses succinct data structures (LOUDS trie + DAWG-style deduplication)
- Fast lookups: O(1) rank/select operations via Poppy bit vectors
- Serializable: Save and load from disk with efficient binary format
- Read-only: Optimized for lookup-heavy workloads
- Storage-agnostic: Works with local files and remote storage via
fsspec
Installation
pip install compact-tree
Or install from source:
git clone https://github.com/andrey-savov/compact-tree.git
cd compact-tree
pip install -e .
Quick Start
from compact_tree import CompactTree
# Build from a nested dict
tree = CompactTree.from_dict({
"a": {
"x": "1",
"y": "2"
},
"b": "3"
})
# Access like a normal dict
print(tree["a"]["x"]) # "1"
print(tree["b"]) # "3"
print("a" in tree) # True
# Serialize to file
tree.serialize("tree.ctree")
# Load from file
loaded_tree = CompactTree("tree.ctree")
# Convert back to plain dict
plain_dict = loaded_tree.to_dict()
How It Works
LOUDS (Level-Order Unary Degree Sequence)
LOUDS is a succinct tree representation that encodes an ordered tree into a single bit string using roughly 2n bits for n nodes (close to the information-theoretic minimum).
Encoding rule: Traverse the tree in breadth-first (level) order. For each node, write d 1-bits (where d is the node's number of children) followed by a single 0-bit.
Example for a root with children A (2 kids) and B (0 kids):
root -> 1 1 0 (2 children, then 0-terminator)
A -> 1 1 0 (2 children)
B -> 0 (leaf)
Navigation relies on rank and select queries:
| Operation | Description |
|---|---|
first_child(v) |
Find the (v-1)-th 0, check next position |
next_sibling(v) |
Find the 1-bit for node v, check next position |
DAWG-Style Deduplication
- Keys are collected, sorted, and deduplicated into a global vocabulary
- Values (leaves) are similarly deduplicated into a value table
- Edge labels store integer IDs rather than raw strings
- Same key/value appearing multiple times is stored only once
Architecture
CompactTree
|
+-- louds : _LOUDS bit-vector tree topology (Poppy rank/select)
+-- elbl : memoryview edge labels (uint32 key ids, 4 bytes per node)
+-- vcol : memoryview value column (uint32: value id or 0xFFFFFFFF for internal nodes)
+-- _keys_buf: memoryview length-prefixed UTF-8 key strings
+-- val : memoryview length-prefixed UTF-8 value strings
Binary Format (v2)
Magic : 5 bytes "CTree"
Version : 8 bytes uint64 LE (always 2)
Header : 5 x 8 bytes lengths of: keys, values, louds, vcol, elbl
Payload : keys_bytes | val_bytes | louds_bytes | vcol_bytes | elbl_bytes
Dependencies
bitarray— Mutable bit arrayssuccinct(Poppy) — Rank/select in O(1)fsspec— Filesystem abstraction for local and remote storage
Testing
pytest test_compact_tree.py
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for details.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Based on LOUDS (Level-Order Unary Degree Sequence) tree representation
- Uses Poppy rank/select implementation from the
succinctlibrary - Inspired by DAWG (Directed Acyclic Word Graph) compression techniques
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file savov_compact_tree-0.2.0.tar.gz.
File metadata
- Download URL: savov_compact_tree-0.2.0.tar.gz
- Upload date:
- Size: 13.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3e5afca50842f7061b62de437c1f74f0a78f0d000e02dcce160c0502f0cd48a0
|
|
| MD5 |
7e521d89f8a77ba7c4a16a0eb3d77294
|
|
| BLAKE2b-256 |
ec42b2eaf487e532a0ae139d08dcf0071ee25afc688f37491d74f0cccdeeef17
|
Provenance
The following attestation bundles were made for savov_compact_tree-0.2.0.tar.gz:
Publisher:
publish.yml on andrey-savov/compact-tree
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
savov_compact_tree-0.2.0.tar.gz -
Subject digest:
3e5afca50842f7061b62de437c1f74f0a78f0d000e02dcce160c0502f0cd48a0 - Sigstore transparency entry: 929106298
- Sigstore integration time:
-
Permalink:
andrey-savov/compact-tree@da19b621f2f6fad33982e622bb4f72281a15db73 -
Branch / Tag:
refs/tags/0.2.0 - Owner: https://github.com/andrey-savov
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@da19b621f2f6fad33982e622bb4f72281a15db73 -
Trigger Event:
release
-
Statement type:
File details
Details for the file savov_compact_tree-0.2.0-py3-none-any.whl.
File metadata
- Download URL: savov_compact_tree-0.2.0-py3-none-any.whl
- Upload date:
- Size: 7.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dd2ca0547085845208ebc88ee617e2755bc6541c23cc9686e350c9a26794cc7c
|
|
| MD5 |
926f5c08ea1259a7d4894f0fa9938de8
|
|
| BLAKE2b-256 |
301bae7eb47eef7233a0b6e73ee52cec18debc11c1cd3114db74acd6e841e82f
|
Provenance
The following attestation bundles were made for savov_compact_tree-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on andrey-savov/compact-tree
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
savov_compact_tree-0.2.0-py3-none-any.whl -
Subject digest:
dd2ca0547085845208ebc88ee617e2755bc6541c23cc9686e350c9a26794cc7c - Sigstore transparency entry: 929106301
- Sigstore integration time:
-
Permalink:
andrey-savov/compact-tree@da19b621f2f6fad33982e622bb4f72281a15db73 -
Branch / Tag:
refs/tags/0.2.0 - Owner: https://github.com/andrey-savov
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@da19b621f2f6fad33982e622bb4f72281a15db73 -
Trigger Event:
release
-
Statement type: