A Python library for reading and writing compressed, sorted key-value stores with efficient lookup using a B-tree-like structure.

Project description

Map With Tree

A Python library for reading and writing compressed, sorted key-value stores with efficient lookup using a B-tree-like structure.

Features

Efficient Storage: Data is compressed using zstandard (zstd) with configurable compression levels
Fast Lookups: Built-in B-tree index for O(log n) key lookups
Fast Iteration: Sequential block iteration without tree traversal for O(n) scanning
Sorted Keys: Keys are automatically sorted during finalization for efficient range queries
Data Integrity: MD5 hash of all entries is computed and stored for verification
Flexible Types: Support for multiple key and value types including bytes, strings, integers, floats, and JSON
Memory Efficient: Block-based compression and caching minimize memory usage
Simple API: Pythonic interface with context managers and dict-like operations

Installation

pip install map_with_tree

Requires Python >= 3.8 and zstd >= 1.5.

Quick Start

Writing Data

import map_with_tree

# Create a new map file with string values
with map_with_tree.open("data.mwt", "w", types=("bytes", "string")) as writer:
    writer.set(b"key_1", "value_1")
    writer.set(b"key_2", "value_2")
    writer.set(b"key_3", "value_3")

Reading Data

import map_with_tree

# Open and read from a map file
with map_with_tree.open("data.mwt") as reader:
    # Get a value by key
    value = reader[b"key_1"]
    
    # Check if key exists
    if b"key_2" in reader:
        print("Key exists!")
    
    # Get with default value
    value = reader.get(b"key_99", default="not found")
    
    # Iterate over all entries
    for key, value in reader:
        print(f"{key}: {value}")
    
    # Get file metadata
    print(f"Total entries: {len(reader)}")
    print(f"Header: {reader.header}")

API Reference

Opening Files

map_with_tree.open(path, mode="r", **kwargs)

path: File path for the map file
mode: "r" for reading, "w" for writing
**kwargs: Additional options for writing (see Writer Options)

Writer Options

MapWithTreeWriter(
    path,
    header=None,              # Custom header metadata (dict)
    types=("bytes", "bytes"), # Tuple of (key_type, value_type)
    keys_per_node=128,        # Number of keys per B-tree node
    block_size=64*1024,       # Block size for compression (64KB default)
    compression_level=3,      # zstd compression level (0-22)
    duplicates="warn,remove"  # Duplicate key handling (see below)
)

Supported Types

bytes: Raw bytes (default)
string or str: UTF-8 encoded strings
int or i64: 64-bit signed integer
uint or u64: 64-bit unsigned integer
i8, i16, i32: Signed integers (8, 16, 32 bit)
u8, u16, u32: Unsigned integers (8, 16, 32 bit)
float or f64: 64-bit float
f32: 32-bit float
json: JSON-serializable objects
struct:format: Custom struct format (e.g., "struct:<IIf" for two unsigned ints and a float)

Duplicate Key Handling

The duplicates parameter controls how duplicate keys are handled during file creation. It accepts a comma-separated string with the following options:

warn: Log the total count of duplicate keys to stderr after sorting
list: Log each duplicate key to stderr as it's encountered
remove: Remove duplicate keys, keeping only the last occurrence

Default is "warn,remove", which removes duplicates and reports the count.

Examples:

# Remove duplicates silently
with map_with_tree.open("data.mwt", "w", duplicates="remove") as writer:
    writer.set(b"key", "value1")
    writer.set(b"key", "value2")  # Will override value1

# Keep duplicates and don't warn (not recommended - may cause lookup issues)
with map_with_tree.open("data.mwt", "w", duplicates="") as writer:
    writer.set(b"key", "value1")
    writer.set(b"key", "value2")  # Both will be kept

# Verbose duplicate tracking
with map_with_tree.open("data.mwt", "w", duplicates="warn,list,remove") as writer:
    writer.set(b"key", "value1")
    writer.set(b"key", "value2")
    # stderr: key b'key' duplicated
    # stderr: warning: 1 duplicated keys

Note: If you don't remove duplicates, the B-tree index will point to the first occurrence, but iteration may encounter all duplicates.

Writer Methods

writer.set(key, value)        # Add a key-value pair
writer.finalize()             # Finalize the file (called automatically on context exit)
writer.close()                # Close file handles

Reader Methods

reader[key]                   # Get value by key (raises KeyError if not found)
reader.get(key, default=None) # Get value with default
key in reader                 # Check if key exists
len(reader)                   # Get number of entries
iter(reader)                  # Iterate over (key, value) pairs in insertion order
reader.sorted_keys()          # Iterate over keys in sorted order (tree traversal)
reader.header                 # Access header metadata
reader.close()                # Close file handle

File Format

Map With Tree (.mwt) files consist of:

Magic Header: 8-byte signature (mwt\0\0\0\0\1)
Header Offset: 8-byte pointer to compressed header
Data Blocks: Compressed blocks containing key-value pairs with zstd
B-tree Index: Tree structure pointing into blocks for efficient key lookups
Compressed Header: JSON metadata with file statistics

Dual Access Modes

The format supports two efficient access patterns:

Random Access: Use the B-tree index for O(log n) lookup of specific keys
Sequential Iteration: Read blocks sequentially for O(n) full scans without tree traversal

This design enables:

Sequential writes for optimal I/O performance during creation
Minimal memory usage during both reading and writing
Fast random access through the B-tree index
Fast sequential iteration by reading blocks in order
Efficient compression with block-level granularity

Examples

Large Dataset

import map_with_tree
import uuid

# Write 100,000 entries
with map_with_tree.open("large.mwt", "w", types=("bytes", "string")) as writer:
    for i in range(100000):
        key = f"key_{i:06d}".encode()
        value = uuid.uuid4().hex
        writer.set(key, value)

# Read and check compression
with map_with_tree.open("large.mwt") as reader:
    print(f"Entries: {len(reader)}")
    uncompressed = reader.header["uncompressed_size"]
    compressed = reader.header["compressed_size"]
    print(f"Compression ratio: {compressed / uncompressed:.2%}")

Structured Data with JSON

import map_with_tree

with map_with_tree.open("users.mwt", "w", types=("string", "json")) as writer:
    writer.set("user_1", {"name": "Alice", "age": 30, "city": "NYC"})
    writer.set("user_2", {"name": "Bob", "age": 25, "city": "SF"})

with map_with_tree.open("users.mwt") as reader:
    user = reader[b"user_1"]
    print(f"{user['name']} is {user['age']} years old")

Custom Struct Types

import map_with_tree

# Values are tuples of (unsigned int, unsigned int, float)
with map_with_tree.open("metrics.mwt", "w", types=("bytes", "struct:<IIf")) as writer:
    writer.set(b"metric_1", (100, 200, 3.14))
    writer.set(b"metric_2", (150, 250, 2.71))

with map_with_tree.open("metrics.mwt") as reader:
    count1, count2, ratio = reader[b"metric_1"]
    print(f"Counts: {count1}, {count2}, Ratio: {ratio}")

Efficient Iteration

The format stores both keys and values in the data blocks, enabling fast sequential iteration without traversing the B-tree:

import map_with_tree

# Iterate through all entries efficiently
with map_with_tree.open("data.mwt") as reader:
    # Sequential iteration reads blocks in order
    for key, value in reader:
        process(key, value)
    
    # This is faster than tree traversal for full scans
    # because it only decompresses blocks without tree lookups

The iteration order is the insertion order (before sorting), which allows you to:

Process all entries efficiently in O(n) time
Scan through data without random access overhead
Stream process large datasets with minimal memory

Sorted Key Iteration

If you need to iterate over keys in sorted order, use the sorted_keys() method which traverses the B-tree:

import map_with_tree

with map_with_tree.open("data.mwt") as reader:
    # Get keys in sorted order via tree traversal
    for key in reader.sorted_keys():
        value = reader[key]
        process(key, value)
    
    # Or just get the sorted keys
    all_sorted_keys = list(reader.sorted_keys())

Key differences between iteration modes:

iter(reader): Returns (key, value) pairs in insertion order, reads blocks sequentially
sorted_keys(): Returns keys only in sorted order, traverses the B-tree structure
Use regular iteration for full scans, use sorted_keys() for sorted access or range queries

Performance Tips

Adjust block size: Larger blocks (e.g., 256KB) improve compression but use more memory
Tune compression level: Lower levels (1-3) for speed, higher (10-22) for size
Choose appropriate types: Use native types (int, float) instead of strings when possible
Batch writes: Add all entries before finalizing to ensure optimal tree structure
Keys per node: Increase for larger datasets to reduce tree height

Project details

Release history Release notifications | RSS feed

0.0.19

Jan 28, 2026

0.0.18

Jan 25, 2026

This version

0.0.17

Jan 25, 2026

0.0.16

Jan 25, 2026

0.0.15

Jan 24, 2026

0.0.14

Jan 24, 2026

0.0.13

Jan 24, 2026

0.0.12

Jan 22, 2026

0.0.11

Jan 22, 2026

0.0.10

Jan 21, 2026

0.0.9

Jan 21, 2026

0.0.8

Jan 21, 2026

0.0.7

Jan 21, 2026

0.0.6

Jan 21, 2026

0.0.5

Jan 21, 2026

0.0.4

Jan 21, 2026

0.0.3

Jan 21, 2026

0.0.2

Jan 21, 2026

0.0.1

Jan 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

map_with_tree-0.0.17.tar.gz (16.1 kB view details)

Uploaded Jan 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

map_with_tree-0.0.17-py3-none-any.whl (14.1 kB view details)

Uploaded Jan 25, 2026 Python 3

File details

Details for the file map_with_tree-0.0.17.tar.gz.

File metadata

Download URL: map_with_tree-0.0.17.tar.gz
Upload date: Jan 25, 2026
Size: 16.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for map_with_tree-0.0.17.tar.gz
Algorithm	Hash digest
SHA256	`3d90dbe9a4a6cf72c7f9dba363164170aeddafb4949ca72fddf3c5568fdaa33e`
MD5	`fa0060a58f57644536647320ec933779`
BLAKE2b-256	`376ae138bc045ddd20df8890b483271b0f9b9c343e985bc9d890b2aead0abe2f`

See more details on using hashes here.

File details

Details for the file map_with_tree-0.0.17-py3-none-any.whl.

File metadata

Download URL: map_with_tree-0.0.17-py3-none-any.whl
Upload date: Jan 25, 2026
Size: 14.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for map_with_tree-0.0.17-py3-none-any.whl
Algorithm	Hash digest
SHA256	`20f16ab5354b321d814f6c302ceaee55ec82337c2d461dcc36bcbe791e20ec40`
MD5	`5c0a8ff83860b0ae5a004f4d1d60adde`
BLAKE2b-256	`198426dfaf4189cef2746ebd55cfb5be0333a6829094685c78e39c4f19678cf1`

See more details on using hashes here.

map-with-tree 0.0.17

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Project description

Map With Tree

Features

Installation

Quick Start

Writing Data

Reading Data

API Reference

Opening Files

Writer Options

Supported Types

Duplicate Key Handling

Writer Methods

Reader Methods

File Format

Dual Access Modes

Examples

Large Dataset

Structured Data with JSON

Custom Struct Types

Efficient Iteration

Sorted Key Iteration

Performance Tips

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes