A Python library for reading and writing compressed, sorted key-value stores with efficient lookup using a B-tree-like structure.
Project description
Map With Tree
A Python library for reading and writing compressed, sorted key-value stores with efficient lookup using a B-tree-like structure.
Features
- Efficient Storage: Data is compressed using zstandard (zstd) with configurable compression levels
- Fast Lookups: Built-in B-tree index for O(log n) key lookups
- Fast Iteration: Sequential block iteration without tree traversal for O(n) scanning
- Sorted Keys: Keys are automatically sorted during finalization for efficient range queries
- Data Integrity: MD5 hash of all entries is computed and stored for verification
- Flexible Types: Support for multiple key and value types including bytes, strings, integers, floats, and JSON
- Memory Efficient: Block-based compression and caching minimize memory usage
- Simple API: Pythonic interface with context managers and dict-like operations
Installation
pip install map_with_tree
Requires Python >= 3.8 and zstd >= 1.5.
Quick Start
Writing Data
import map_with_tree
# Create a new map file with string values
with map_with_tree.open("data.mwt", "w", types=("bytes", "string")) as writer:
writer.set(b"key_1", "value_1")
writer.set(b"key_2", "value_2")
writer.set(b"key_3", "value_3")
Reading Data
import map_with_tree
# Open and read from a map file
with map_with_tree.open("data.mwt") as reader:
# Get a value by key
value = reader[b"key_1"]
# Check if key exists
if b"key_2" in reader:
print("Key exists!")
# Get with default value
value = reader.get(b"key_99", default="not found")
# Iterate over all entries
for key, value in reader:
print(f"{key}: {value}")
# Get file metadata
print(f"Total entries: {len(reader)}")
print(f"Header: {reader.header}")
API Reference
Opening Files
map_with_tree.open(path, mode="r", **kwargs)
path: File path for the map filemode:"r"for reading,"w"for writing**kwargs: Additional options for writing (see Writer Options)
Writer Options
MapWithTreeWriter(
path,
header=None, # Custom header metadata (dict)
types=("bytes", "bytes"), # Tuple of (key_type, value_type)
keys_per_node=128, # Number of keys per B-tree node
block_size=64*1024, # Block size for compression (64KB default)
compression_level=3 # zstd compression level (0-22)
)
Supported Types
- bytes: Raw bytes (default)
- string or str: UTF-8 encoded strings
- int or i64: 64-bit signed integer
- uint or u64: 64-bit unsigned integer
- i8, i16, i32: Signed integers (8, 16, 32 bit)
- u8, u16, u32: Unsigned integers (8, 16, 32 bit)
- float or f64: 64-bit float
- f32: 32-bit float
- json: JSON-serializable objects
- struct:format: Custom struct format (e.g.,
"struct:<IIf"for two unsigned ints and a float)
Writer Methods
writer.set(key, value) # Add a key-value pair
writer.finalize() # Finalize the file (called automatically on context exit)
writer.close() # Close file handles
Reader Methods
reader[key] # Get value by key (raises KeyError if not found)
reader.get(key, default=None) # Get value with default
key in reader # Check if key exists
len(reader) # Get number of entries
iter(reader) # Iterate over (key, value) pairs in insertion order
reader.sorted_keys() # Iterate over keys in sorted order (tree traversal)
reader.header # Access header metadata
reader.close() # Close file handle
File Format
Map With Tree (.mwt) files consist of:
- Magic Header: 8-byte signature (
mwt\0\0\0\0\1) - Header Offset: 8-byte pointer to compressed header
- Data Blocks: Compressed blocks containing key-value pairs with zstd
- B-tree Index: Tree structure pointing into blocks for efficient key lookups
- Compressed Header: JSON metadata with file statistics
Dual Access Modes
The format supports two efficient access patterns:
- Random Access: Use the B-tree index for O(log n) lookup of specific keys
- Sequential Iteration: Read blocks sequentially for O(n) full scans without tree traversal
This design enables:
- Sequential writes for optimal I/O performance during creation
- Minimal memory usage during both reading and writing
- Fast random access through the B-tree index
- Fast sequential iteration by reading blocks in order
- Efficient compression with block-level granularity
Examples
Large Dataset
import map_with_tree
import uuid
# Write 100,000 entries
with map_with_tree.open("large.mwt", "w", types=("bytes", "string")) as writer:
for i in range(100000):
key = f"key_{i:06d}".encode()
value = uuid.uuid4().hex
writer.set(key, value)
# Read and check compression
with map_with_tree.open("large.mwt") as reader:
print(f"Entries: {len(reader)}")
uncompressed = reader.header["uncompressed_size"]
compressed = reader.header["compressed_size"]
print(f"Compression ratio: {compressed / uncompressed:.2%}")
Structured Data with JSON
import map_with_tree
with map_with_tree.open("users.mwt", "w", types=("string", "json")) as writer:
writer.set("user_1", {"name": "Alice", "age": 30, "city": "NYC"})
writer.set("user_2", {"name": "Bob", "age": 25, "city": "SF"})
with map_with_tree.open("users.mwt") as reader:
user = reader[b"user_1"]
print(f"{user['name']} is {user['age']} years old")
Custom Struct Types
import map_with_tree
# Values are tuples of (unsigned int, unsigned int, float)
with map_with_tree.open("metrics.mwt", "w", types=("bytes", "struct:<IIf")) as writer:
writer.set(b"metric_1", (100, 200, 3.14))
writer.set(b"metric_2", (150, 250, 2.71))
with map_with_tree.open("metrics.mwt") as reader:
count1, count2, ratio = reader[b"metric_1"]
print(f"Counts: {count1}, {count2}, Ratio: {ratio}")
Efficient Iteration
The format stores both keys and values in the data blocks, enabling fast sequential iteration without traversing the B-tree:
import map_with_tree
# Iterate through all entries efficiently
with map_with_tree.open("data.mwt") as reader:
# Sequential iteration reads blocks in order
for key, value in reader:
process(key, value)
# This is faster than tree traversal for full scans
# because it only decompresses blocks without tree lookups
The iteration order is the insertion order (before sorting), which allows you to:
- Process all entries efficiently in O(n) time
- Scan through data without random access overhead
- Stream process large datasets with minimal memory
Sorted Key Iteration
If you need to iterate over keys in sorted order, use the sorted_keys() method which traverses the B-tree:
import map_with_tree
with map_with_tree.open("data.mwt") as reader:
# Get keys in sorted order via tree traversal
for key in reader.sorted_keys():
value = reader[key]
process(key, value)
# Or just get the sorted keys
all_sorted_keys = list(reader.sorted_keys())
Key differences between iteration modes:
iter(reader): Returns (key, value) pairs in insertion order, reads blocks sequentiallysorted_keys(): Returns keys only in sorted order, traverses the B-tree structure- Use regular iteration for full scans, use
sorted_keys()for sorted access or range queries
Performance Tips
- Adjust block size: Larger blocks (e.g., 256KB) improve compression but use more memory
- Tune compression level: Lower levels (1-3) for speed, higher (10-22) for size
- Choose appropriate types: Use native types (int, float) instead of strings when possible
- Batch writes: Add all entries before finalizing to ensure optimal tree structure
- Keys per node: Increase for larger datasets to reduce tree height
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file map_with_tree-0.0.15.tar.gz.
File metadata
- Download URL: map_with_tree-0.0.15.tar.gz
- Upload date:
- Size: 14.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3144933dffd862c520ce79787fbdffedbf4abc24dbb5ff908620b21daa47696e
|
|
| MD5 |
784ec09d17126a6702f3f3f5c52b183b
|
|
| BLAKE2b-256 |
fe099c07c74dd1d550488acaed8b504e63d61d5189cbbeca2486f16d30b91ac6
|
File details
Details for the file map_with_tree-0.0.15-py3-none-any.whl.
File metadata
- Download URL: map_with_tree-0.0.15-py3-none-any.whl
- Upload date:
- Size: 13.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
91c0da42f2a516f26ae1c78770a4fbec1f23e1de1e146cb5d4bbf079ef4e46f5
|
|
| MD5 |
c1c78ec07701008a5d30d3d5fd92637c
|
|
| BLAKE2b-256 |
eb179e170f13f4a28ea6b063ccb313067a427f33d67f9b0379f07e73faf35d78
|