Skip to main content

Library for reading and writing Map With Tree files

Project description

Map With Tree

A Python library for reading and writing compressed, sorted key-value stores with efficient lookup using a B-tree-like structure.

Features

  • Efficient Storage: Data is compressed using zstandard (zstd) with configurable compression levels
  • Fast Lookups: Built-in B-tree index for O(log n) key lookups
  • Sorted Keys: Keys are automatically sorted during finalization for efficient traversal
  • Data Integrity: MD5 hash of all entries is computed and stored for verification
  • Flexible Types: Support for multiple key and value types including bytes, strings, integers, floats, and JSON
  • Memory Efficient: Block-based compression and caching minimize memory usage
  • Simple API: Pythonic interface with context managers and dict-like operations

Installation

pip install map_with_tree

Requires Python >= 3.8 and zstd >= 1.5.

Quick Start

Writing Data

import map_with_tree

# Create a new map file with string values
with map_with_tree.open("data.mwt", "w", values_type="string") as writer:
    writer.add_entry(b"key_1", "value_1")
    writer.add_entry(b"key_2", "value_2")
    writer.add_entry(b"key_3", "value_3")

Reading Data

import map_with_tree

# Open and read from a map file
with map_with_tree.open("data.mwt") as reader:
    # Get a value by key
    value = reader[b"key_1"]
    
    # Check if key exists
    if b"key_2" in reader:
        print("Key exists!")
    
    # Get with default value
    value = reader.get(b"key_99", default="not found")
    
    # Iterate over all entries
    for key, value in reader:
        print(f"{key}: {value}")
    
    # Get file metadata
    print(f"Total entries: {len(reader)}")
    print(f"Header: {reader.header}")

API Reference

Opening Files

map_with_tree.open(path, mode="r", **kwargs)
  • path: File path for the map file
  • mode: "r" for reading, "w" for writing
  • **kwargs: Additional options for writing (see Writer Options)

Writer Options

MapWithTreeWriter(
    path,
    header=None,              # Custom header metadata (dict)
    keys_type="bytes",        # Type for keys
    values_type="bytes",      # Type for values
    keys_per_node=128,        # Number of keys per B-tree node
    block_size=64*1024,       # Block size for compression (64KB default)
    compression_level=3       # zstd compression level (0-22)
)

Supported Types

  • bytes: Raw bytes (default)
  • string or str: UTF-8 encoded strings
  • int or i64: 64-bit signed integer
  • uint or u64: 64-bit unsigned integer
  • i8, i16, i32: Signed integers (8, 16, 32 bit)
  • u8, u16, u32: Unsigned integers (8, 16, 32 bit)
  • float or f64: 64-bit float
  • f32: 32-bit float
  • json: JSON-serializable objects
  • struct:format: Custom struct format (e.g., "struct:<IIf" for two unsigned ints and a float)

Writer Methods

writer.add_entry(key, value)  # Add a key-value pair
writer.finalize()             # Finalize the file (called automatically on context exit)
writer.close()                # Close file handles

Reader Methods

reader[key]                   # Get value by key (raises KeyError if not found)
reader.get(key, default=None) # Get value with default
key in reader                 # Check if key exists
len(reader)                   # Get number of entries
reader.header                 # Access header metadata
reader.close()                # Close file handle

File Format

Map With Tree (.mwt) files consist of:

  1. Magic Header: 8-byte signature (mwt\0\0\0\0\1)
  2. Header Offset: 8-byte pointer to compressed header
  3. Data Blocks: Compressed blocks of values with zstd
  4. B-tree Index: Tree structure for efficient key lookup and MD5 hash

The format ensures:

  • Sequential writes for optimal I/O performance
  • Minimal memory usage during both reading and writing
  • Fast random access through the B-tree index
  • Efficient compression with block-level granularity
  • Data integrity verification through hashing
  • Efficient compression with block-level granularity

Examples

Large Dataset

import map_with_tree
import uuid

# Write 100,000 entries
with map_with_tree.open("large.mwt", "w", values_type="string") as writer:
    for i in range(100000):
        key = f"key_{i:06d}".encode()
        value = uuid.uuid4().hex
        writer.add_entry(key, value)

# Read and check compression
with map_with_tree.open("large.mwt") as reader:
    print(f"Entries: {len(reader)}")
    uncompressed = reader.header["uncompressed_size"]
    compressed = reader.header["compressed_size"]
    print(f"Compression ratio: {compressed / uncompressed:.2%}")

Structured Data with JSON

import map_with_tree

with map_with_tree.open("users.mwt", "w", keys_type="string", values_type="json") as writer:
    writer.add_entry("user_1", {"name": "Alice", "age": 30, "city": "NYC"})
    writer.add_entry("user_2", {"name": "Bob", "age": 25, "city": "SF"})

with map_with_tree.open("users.mwt") as reader:
    user = reader[b"user_1"]
    print(f"{user['name']} is {user['age']} years old")

Custom Struct Types

import map_with_tree

# Values are tuples of (unsigned int, unsigned int, float)
with map_with_tree.open("metrics.mwt", "w", values_type="struct:<IIf") as writer:
    writer.add_entry(b"metric_1", (100, 200, 3.14))
    writer.add_entry(b"metric_2", (150, 250, 2.71))

with map_with_tree.open("metrics.mwt") as reader:
    count1, count2, ratio = reader[b"metric_1"]
    print(f"Counts: {count1}, {count2}, Ratio: {ratio}")

Performance Tips

  1. Adjust block size: Larger blocks (e.g., 256KB) improve compression but use more memory
  2. Tune compression level: Lower levels (1-3) for speed, higher (10-22) for size
  3. Choose appropriate types: Use native types (int, float) instead of strings when possible
  4. Batch writes: Add all entries before finalizing to ensure optimal tree structure
  5. Keys per node: Increase for larger datasets to reduce tree height

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

map_with_tree-0.0.1.tar.gz (10.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

map_with_tree-0.0.1-py3-none-any.whl (9.8 kB view details)

Uploaded Python 3

File details

Details for the file map_with_tree-0.0.1.tar.gz.

File metadata

  • Download URL: map_with_tree-0.0.1.tar.gz
  • Upload date:
  • Size: 10.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for map_with_tree-0.0.1.tar.gz
Algorithm Hash digest
SHA256 472b2d12fe5b499cc48b7b281a33f514213c65fcb65d55c9119acc4688e46732
MD5 2089172859f8117e1febef53bf58973e
BLAKE2b-256 0e0e3379c6fa6fd38a96d36c537eb52d9fc0d22332bf98022770d16ce316a22c

See more details on using hashes here.

File details

Details for the file map_with_tree-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: map_with_tree-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 9.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for map_with_tree-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d0781320dc89bad5f2aff04df6173ab09439845804a8d1548c67099b6daf6816
MD5 bc1b5c90f02ae541f7a394a1245ceadc
BLAKE2b-256 f04a5bc381e50ca86f279076681cd2f9bf56e41298e7beb2bfbc8290407569d2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page