Skip to main content

A fast RocksDB wrapper for Python using pybind11.

Project description

PyPI version Python versions

pyrex

Installation

pyrex-rocksdb

A python wrapper for the original (C++) version of RocksDB.

Installation

For linux systems, wheels are provided and can be installed from pypi using:

pip install pyrex-rocksdb

For Windows and MacOS I have built an earlier version of the library. I will re-build once I include certain other important features in the API that are not yet implemented.

Motivation

This library is intended for providing a fast, write-optimized, in-process key value (KV) store in python. Therefore the "big brothers" of the database are the likes of MongoDB and Cassandra. The difference is that you don't need a separate server to run this (hence "in-process") and it is designed to be fairly portable.

RocksDB, which is the underlying storage engine of this database, is an LSM-tree engine. An LSM-tree is different from the ballanced tree index databases (e.g., B-tree/ and B+tree databases). LSM-tree databases offer very high write throughputs and better space efficiency. See more about the motivation for LSM-tree databases (and RocksDB in particular) in this talk.

LSM-tree + SSTable engine basics

To understand where pyrex provides efficiency gains, it is important to understand some basics about the underlying RocksDB engine.

RocksDB and LevelDB are key-value stores with a Log-Structured Merge-tree (LSM-tree) architecture.

The key components of LSM-tree architectures are

  • A MemTable that stores in-memory sorted data
  • A set of Sorted-String tables (SSTables) which are immutable sorted files on disk where data from the MemTable is flushed
  • The process of Compaction, which is a background process that merges the SSTables to remove redundant data and keep read performance high.

In such databases, fast writes create many small, sorted data files called SSTables. To prevent reads from slowing down by checking too many files, a background process called compaction merges these SSTables together. This process organizes the data into levels, where newer, overlapping files sit in Level 0 and are progressively merged into higher levels (Level 1, Level 2, etc.). Each higher level contains larger, non-overlapping files, which ensures that finding a key remains efficient and old data is purged to save space. There are several optimizations and configurations possible for these processes (configurability and "pluggability" are commonly cited RocksDB advantages).

However the main big advantage of RocksDB over LevelDB is its multi-threaded compaction support (LevelDB supports only single threaded compaction, which comes with significant performance limitations). There are several other configurability advantages RocksDB offers over LevelDB. For a more elaborate enumaration of RocksDB advantages please refer to the RocksDB wiki.

Not all are currently supported by the pyrex API, but I'm working on supporting more of them. Feel free to open an issue if there is a feature you want to see (or open a pull request).

Example usage:

Here is a simple example showing the usage of put/get in the DB:

import pyrex
import os
import shutil

DB_PATH = "./test_rocksdb_minimal"

with pyrex.PyRocksDB(DB_PATH) as db:
    db.put(b"my_key", b"my_value")
    retrieved_value = db.get(b"my_key")

print(f"Retrieved: {retrieved_value.decode()}") # Output: Retrieved: my_value

for more examples check the relevant folder and the documentation.

Installation

Note on CICD The wheels provided are not completely platform-independent at the moment. I heavily rely on github actions to develop since I don't own mac or windows machines. The CICD workflow for package builds is under development A windows/macos/linux build was successful, but further development is needed.

Benchmarks

Pyrex was benchmarked against plyvel and lmdb (which is based on a B+tree -- based architecture and relies on OS's block cache).

Initial benchmarks are promissing and to be reported soon.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pyrex_rocksdb-0.1.3-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (4.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

pyrex_rocksdb-0.1.3-cp312-cp312-macosx_14_0_arm64.whl (3.9 MB view details)

Uploaded CPython 3.12macOS 14.0+ ARM64

File details

Details for the file pyrex_rocksdb-0.1.3-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyrex_rocksdb-0.1.3-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 92bfab555ba46fc5fc98f676b835f6e070bde5f5aff98bc43ea18c65ebf364fe
MD5 934960c8817d5e9b85078f1d3acf66d1
BLAKE2b-256 0738c6382c4ab47f9c84b81a430b7ec987fe32c0a608a9cc1364526db0340b96

See more details on using hashes here.

File details

Details for the file pyrex_rocksdb-0.1.3-cp312-cp312-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for pyrex_rocksdb-0.1.3-cp312-cp312-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 50d24cf07bf87e40487bb4ace4364f9def80fd6364ba07c62df5859edea68a50
MD5 5161d95480d3a450dc7fb1b7d575ffa5
BLAKE2b-256 7a04d611444b07a4e1dda6d6d758990263fced8b9b30d9bb32e156aa1e31d08f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page