Skip to main content

A lightweight Python wrapper for RocksDB using CFFI

Project description

RockStore

PyPI version Python Versions License: MIT Code style: black Tests

A lightweight Python wrapper for RocksDB using CFFI.

Overview

RockStore provides a simple, Pythonic interface to RocksDB, Facebook's persistent key-value store. It uses CFFI for efficient native library bindings and focuses on clean binary data operations.

Features

  • Simple API: Easy-to-use Python interface for RocksDB operations
  • Binary Operations: Direct work with bytes for maximum performance
  • Context Manager: Automatic resource management with with statements
  • Configurable Options: Customize compression, buffer sizes, and more
  • Read-Only Mode: Open databases in read-only mode for safe concurrent access
  • Cross-Platform: Works on macOS, Linux, and Windows

Installation

Prerequisites

First, install RocksDB on your system:

macOS (using Homebrew):

brew install rocksdb

Ubuntu/Debian:

sudo apt-get install librocksdb-dev

CentOS/RHEL/Fedora:

sudo yum install rocksdb-devel
# or for newer versions:
sudo dnf install rocksdb-devel

Windows:

  • Download pre-built RocksDB binaries or build from source
  • Ensure rocksdb.dll is in your PATH

Install RockStore

pip install rockstore

Quick Start

Basic Usage

from rockstore import RockStore

# Open a database
db = RockStore('/path/to/database')

# Store and retrieve binary data
db.put(b'key1', b'value1')
value = db.get(b'key1')
print(value)  # b'value1'

# Store and retrieve string data (encode/decode manually)
db.put('name'.encode(), 'Alice'.encode())
name = db.get('name'.encode()).decode()
print(name)  # 'Alice'

# Delete data
db.delete(b'key1')

# Clean up
db.close()

Using Context Manager (Recommended)

from rockstore import open_database

with open_database('/path/to/database') as db:
    db.put(b'hello', b'world')
    value = db.get(b'hello')
    print(value)  # b'world'
# Database is automatically closed

Getting All Data

with open_database('/path/to/database') as db:
    db.put(b'key1', b'value1')
    db.put(b'key2', b'value2')
    
    # Get all key-value pairs (warning: loads everything into memory)
    all_data = db.get_all()
    for key, value in all_data.items():
        print(f"{key} -> {value}")

Batch Operations

For maximum performance when writing multiple records, use write_batch. It is significantly faster (4x+) than individual put operations and guarantees atomicity (all or nothing).

with open_database('/path/to/database') as db:
    # Prepare batch data (list of tuples)
    batch_data = [
        (b'key1', b'value1'),
        (b'key2', b'value2'),
        (b'key3', b'value3')
    ]
    
    # Atomic write
    db.write_batch(batch_data)
    
    # Atomic delete
    keys_to_delete = [b'key1', b'key2']
    db.delete_batch(keys_to_delete)

Range Queries and Pagination

For large databases, use range queries with pagination instead of get_all():

with open_database('/path/to/database') as db:
    # Add sample data
    for i in range(10000):
        key = f"user:{i:06d}".encode()
        value = f"User {i}".encode()
        db.put(key, value)
    
    # Paginated access - get 1000 records at a time
    batch_size = 1000
    start_key = None
    
    while True:
        # Get next batch
        batch = db.get_range(start_key=start_key, limit=batch_size)
        if not batch:
            break
            
        print(f"Processing {len(batch)} records...")
        
        # Process the batch
        for key, value in batch.items():
            process_record(key, value)
        
        # Setup for next batch
        last_key = max(batch.keys())
        start_key = last_key + b'\x00'  # Next key after last_key
    
    # Query specific ranges
    user_data = db.get_range(
        start_key=b'user:', 
        end_key=b'user:\xFF', 
        limit=500
    )
    
    # Memory-efficient iteration (one record at a time)
    for key, value in db.iterate_range(start_key=b'user:', end_key=b'user:\xFF'):
        process_user(key, value)

Handling 10M+ Record Databases

For very large databases (10M+ records), here's how to efficiently paginate in 100K batches:

def process_large_database_in_batches(db_path, batch_size=100_000):
    """
    Process a large database (10M+ records) in manageable batches.
    This approach uses constant memory regardless of database size.
    """
    with open_database(db_path) as db:
        start_key = None
        total_processed = 0
        batch_count = 0
        
        while True:
            # Get next batch
            batch = db.get_range(start_key=start_key, limit=batch_size)
            if not batch:
                break
            
            batch_count += 1
            total_processed += len(batch)
            
            print(f"Processing batch {batch_count}: {len(batch)} records")
            print(f"Total processed so far: {total_processed}")
            
            # Process each record in the batch
            for key, value in batch.items():
                # Your processing logic here
                process_record(key, value)
            
            # Prepare for next batch
            last_key = max(batch.keys())
            start_key = last_key + b'\x00'
            
            # Optional: Add progress tracking or break conditions
            if total_processed >= 10_000_000:  # Safety limit
                break
        
        print(f"Completed! Processed {total_processed} records in {batch_count} batches")

# Even more memory-efficient approach using iterator
def stream_process_large_database(db_path):
    """
    Stream process records one at a time - ultimate memory efficiency.
    """
    with open_database(db_path) as db:
        processed = 0
        for key, value in db.iterate_range():
            process_record(key, value)
            processed += 1
            
            if processed % 100_000 == 0:
                print(f"Processed {processed} records...")

Working with Strings

# Helper functions for string encoding/decoding
def encode_string(s):
    return s.encode('utf-8')

def decode_bytes(b):
    return b.decode('utf-8')

with open_database('/path/to/database') as db:
    # Store string data
    db.put(encode_string('user:123'), encode_string('John Doe'))
    
    # Retrieve and decode
    user_data = db.get(encode_string('user:123'))
    if user_data:
        print(decode_bytes(user_data))  # 'John Doe'

Configuration Options

from rockstore import RockStore

# Create database with custom options
options = {
    'create_if_missing': True,
    'compression_type': 'lz4_compression',
    'write_buffer_size': 64 * 1024 * 1024,  # 64MB
    'max_open_files': 1000
}

db = RockStore('/path/to/database', options=options)

Available Options

  • create_if_missing (bool): Create database if it doesn't exist (default: True)
  • read_only (bool): Open database in read-only mode (default: False)
  • compression_type (str): Compression algorithm - 'no_compression', 'snappy_compression', 'zlib_compression', 'bz2_compression', 'lz4_compression', 'lz4hc_compression', 'xpress_compression', 'zstd_compression' (default: 'snappy_compression')
  • write_buffer_size (int): Write buffer size in bytes (default: 64MB)
  • max_open_files (int): Maximum number of open files (default: 1000)

Per-Operation Options

# Synchronous write (forces immediate disk write)
db.put(b'key', b'value', sync=True)

# Read without caching
value = db.get(b'key', fill_cache=False)

# Synchronous delete
db.delete(b'key', sync=True)

API Reference

RockStore Class

Constructor

RockStore(path, options=None)

Methods

Binary Operations:

  • put(key: bytes, value: bytes, sync: bool = False) - Store binary data
  • get(key: bytes, fill_cache: bool = True) -> bytes | None - Retrieve binary data
  • delete(key: bytes, sync: bool = False) - Delete binary data

Batch Operations:

  • write_batch(operations: list[tuple[bytes, bytes]], sync: bool = False) - Atomically write multiple key-value pairs
  • delete_batch(keys: list[bytes], sync: bool = False) - Atomically delete multiple keys

Bulk Read Operations:

  • get_all(fill_cache: bool = True) -> dict[bytes, bytes] - Get all key-value pairs (loads into memory)
  • get_range(start_key: bytes = None, end_key: bytes = None, limit: int = None, fill_cache: bool = True) -> dict[bytes, bytes] - Get range of key-value pairs with pagination support
  • iterate_range(start_key: bytes = None, end_key: bytes = None, fill_cache: bool = True) -> Iterator[tuple[bytes, bytes]] - Memory-efficient iterator over key-value pairs

Resource Management:

  • close() - Close the database
  • Context manager support (with statement)

Context Manager

open_database(path, options=None) -> RockStore

Requirements

  • Python 3.8+
  • CFFI >= 1.15.0
  • RocksDB library installed on system

Development

Running Tests

# Install development dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run with coverage
pytest --cov=rockstore

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.


Developed by Chainscore Labs
Chainscore Labs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rockstore-0.2.0.tar.gz (21.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rockstore-0.2.0-py3-none-any.whl (13.0 kB view details)

Uploaded Python 3

File details

Details for the file rockstore-0.2.0.tar.gz.

File metadata

  • Download URL: rockstore-0.2.0.tar.gz
  • Upload date:
  • Size: 21.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rockstore-0.2.0.tar.gz
Algorithm Hash digest
SHA256 17336ec18687a24568529513d0d42d18d92031c2386b7fea2153cff47b3561fd
MD5 e2de1f1514439c932a1741a43b2efa23
BLAKE2b-256 5fab119e7aa5e80e59b4924625f4922c7de45d6f90a0bfc3784c0644735e9f8a

See more details on using hashes here.

Provenance

The following attestation bundles were made for rockstore-0.2.0.tar.gz:

Publisher: publish.yml on Chainscore/rockstore

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rockstore-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: rockstore-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 13.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rockstore-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dd10fe26352701e8bed56a835578af6728b8635ea8c724682e557d6a9999e65a
MD5 98a7b2e7ddedd7dba0a3ec31d0c1d133
BLAKE2b-256 f19803ebe3ab3b573714c398593fc79d67e53f0fac7824e9d66ae5242acadda9

See more details on using hashes here.

Provenance

The following attestation bundles were made for rockstore-0.2.0-py3-none-any.whl:

Publisher: publish.yml on Chainscore/rockstore

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page