Skip to main content

SIMD-optimized append-only schema-less storage engine. Key-based binary storage in a single-file storage container.

Project description

SIMD R Drive (Python Bindings)

Experimental Python bindings for SIMD R Drive — a high-performance, schema-less storage engine using a single-file storage container optimized for zero-copy binary access, written in Rust.

This library provides access to core functionality of simd-r-drive from Python, including high-performance key/value storage, zero-copy reads via memoryview, and support for streaming writes and reads.

Threaded streaming writes from Python are not supported. See Thread Safety for important limitations.

Features

  • 🔑 Append-only key/value storage
  • ⚡ Zero-copy reads via memoryview and mmap
  • 📆 Single-file binary container (no schema or serialization required)
  • ↺ Streaming interface for writing and reading large entries
  • 🐍 Native Rust extension module for Python (via PyO3)

Supported Environments

The simd_r_drive_py Python bindings are built as native extension modules and require environments that support both Python and Rust toolchains.

✅ Platforms

  • Linux (x86_64, aarch64)
  • macOS (x86_64, arm64/M1/M2)

Wheels are built using cibuildwheel and tested on GitHub Actions.

✅ Supported Python Versions

  • Python 3.10 – 3.13 (CPython) – Supported on CPython only.

Older versions (≤3.9) are explicitly skipped during wheel builds.

❌ Not Supported

  • Windows (x86_64, AMD64, ARM64) Python bindings are not officially supported on Windows due to platform-specific filesystem and memory-mapping inconsistencies in the Python runtime.

    The underlying Rust library works on Windows and is tested continuously, but the Python bindings fail some unit tests in CI. Manual builds (including AMD64 and ARM64) have succeeded locally but are not considered production-stable.

  • Python < 3.10
  • 32-bit Python
  • musl-based Linux environments (e.g., Alpine Linux)
  • PyPy or other alternative Python interpreters

If you need support for other environments or interpreters, consider compiling from source with maturin develop inside a compatible environment.

Storage Layout

Offset Range Field Size (Bytes) Description
0 → N Payload N Variable-length data
N → N + 8 Key Hash 8 64-bit XXH3 hash of the key (fast lookups)
N + 8 → N + 16 Prev Offset 8 Absolute offset pointing to the previous version
N + 16 → N + 20 Checksum 4 32-bit CRC32C checksum for integrity verification

Installation

pip install -i simd-r-drive-py

Usage

Regular Writes and Reads

from simd_r_drive import DataStore

# Create or open a datastore
store = DataStore("mydata.bin")

# Write a key/value pair
store.write(b"username", b"jdoe")

# Read the value
value = store.read(b"username")
print(value)  # b'jdoe'

# Check existence
assert store.exists(b"username")

# Delete the key
store.delete(b"username")
assert store.read(b"username") is None

Batch Writes

from simd_r_drive import DataStore

store = DataStore("batch.bin")

# Prepare entries as a list of (key, value) byte tuples
entries = [
    (b"user:1", b"alice"),
    (b"user:2", b"bob"),
    (b"user:3", b"charlie"),
]

# Write all entries in a single batch
store.batch_write(entries)

# Verify that all entries were written correctly
for key, value in entries:
    assert store.read(key) == value

Streamed Writes and Reads (Large Payloads)

from simd_r_drive import DataStore
import io

store = DataStore("streamed.bin")

# Simulated payload — in practice, this could be any file-like stream,
# including one that does not fit entirely into memory.
payload = b"x" * (10 * 1024 * 1024)  # Example: 10 MB of dummy data
stream = io.BytesIO(payload)

store.write_stream(b"large-file", stream)

# Read the payload back in chunks
read_stream = store.read_stream(b"large-file")
result = bytearray()

for chunk in read_stream:
    result.extend(chunk)

assert result == payload

API

DataStore(path: str)

Opens (or creates) a file-backed storage container at the given path.

.write(key: bytes, value: bytes) -> None

Atomically appends a new key-value entry. Overwrites any previous version of the key.

.batch_write(items: List[Tuple[bytes, bytes]]) -> None

Writes multiple key-value pairs in a single operation. Each item must be a tuple of (key, value) where both are bytes.

.write_stream(key: bytes, reader: IO[bytes]) -> None

Streams from a Python file-like object (.read(n) interface). Not thread-safe.

.read(key: bytes) -> Optional[bytes]

Returns the full value for a key, or None if the key does not exist.

.read_entry(key: bytes) -> Optional[EntryHandle]

Returns a memory-mapped handle, exposing .as_memoryview() for zero-copy access.

.read_stream(key: bytes) -> Optional[EntryStream]

Returns a streaming reader exposing .read(n).

.delete(key: bytes) -> None

Marks an entry as deleted and no longer available to be read. The file remains append-only; use Rust-side compaction if needed.

.exists(key: bytes) -> bool

Returns whether a key is currently valid in the index.

Thread Safety

This Python binding is not thread-safe.

Due to Python’s Global Interpreter Lock (GIL) and the limitations of PyO3, concurrent streaming writes or reads from multiple threads are not supported, and doing so may cause hangs or inconsistent behavior.

  • Use only from a single thread.
  • ❌ Do not call methods like write_stream or read_stream from multiple threads.
  • ❌ Do not share a DataStore instance across threads.
  • ✅ For concurrent, high-performance use — especially with streaming — use the native Rust version directly.

This design avoids working around the GIL or spawning internal locks for artificial concurrency. If you need reliable multithreading, call into the Rust API instead.

Limitations

  • Python bindings currently lack async support.
  • write_stream is blocking and not safe for concurrent use.
  • Compaction is not yet exposed via Python.
  • This is not a drop-in database — you're expected to manage your own data formats.

Development

To develop and test the Python bindings:

Requirements

  • Python 3.10 or above
  • Rust toolchain (with cargo)
pip install -r requirements.txt -r requirements-dev.txt

Test Changes

maturin develop # Builds the Rust library
pytest # Tests the Python integration

Build a Release Wheel

maturin build --release
pip install dist/simd_r_drive_py-*.whl

License

Licensed under the Apache-2.0 License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

simd_r_drive_py-0.4.1a15-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (340.8 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

simd_r_drive_py-0.4.1a15-cp313-cp313-macosx_11_0_arm64.whl (305.4 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

simd_r_drive_py-0.4.1a15-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (340.9 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

simd_r_drive_py-0.4.1a15-cp312-cp312-macosx_11_0_arm64.whl (305.4 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

simd_r_drive_py-0.4.1a15-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (341.4 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

simd_r_drive_py-0.4.1a15-cp311-cp311-macosx_11_0_arm64.whl (308.7 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

simd_r_drive_py-0.4.1a15-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (341.7 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

simd_r_drive_py-0.4.1a15-cp310-cp310-macosx_11_0_arm64.whl (308.8 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

File details

Details for the file simd_r_drive_py-0.4.1a15-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for simd_r_drive_py-0.4.1a15-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 485fb76bf8e4a2bc12ead66314d3a8b5c0f09aeca441c508091dbbcbccc3f789
MD5 94baa80bcd6301e53da4b5c9858ff1d6
BLAKE2b-256 b86c12b1b1aa81c3992a440797e3e21e2ab54a2f0d987877957dcacae8afb463

See more details on using hashes here.

File details

Details for the file simd_r_drive_py-0.4.1a15-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for simd_r_drive_py-0.4.1a15-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1fd22f45a4336256a7a0c10fe804a2695bfffa42eaaecfe97ed09ec8b01f541e
MD5 51b7657c770d80f97c07866d2c3a646e
BLAKE2b-256 b64b775ee00869b86d6ee210579a0bb7d8db44f59188f71b655a898adcde743e

See more details on using hashes here.

File details

Details for the file simd_r_drive_py-0.4.1a15-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for simd_r_drive_py-0.4.1a15-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 46c51f074a8226c714ca1fffa3d8ad281325573b4f2f3dab828bcf999a0cf30f
MD5 aedcb737333422be56025ce521d933be
BLAKE2b-256 ec7a25d4ebb827bf0ffc374109206c5762a2e8a7015f29a43fafbd08e408bde1

See more details on using hashes here.

File details

Details for the file simd_r_drive_py-0.4.1a15-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for simd_r_drive_py-0.4.1a15-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6bb77a2c838516e69c29f93f36a7b2b6f5a0c8e38cb7a3f22333726099e48073
MD5 d1410343b0f115966d0d23c4be17180a
BLAKE2b-256 c9fea6a8f4625b741345d4cc954d7a2510a54d09b6a0e4be6ae330706e4106d8

See more details on using hashes here.

File details

Details for the file simd_r_drive_py-0.4.1a15-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for simd_r_drive_py-0.4.1a15-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 886e30e63f0ce5d07529c1e99fb5042ff9c9965914a5b52ad0587c9914388afb
MD5 e94a2ecbb259d70e2711ee2e3d9ef8c8
BLAKE2b-256 b233a95b4c7bc15601ab64b3726ea0c42070c8aa7f32435e0cfe7809ab354582

See more details on using hashes here.

File details

Details for the file simd_r_drive_py-0.4.1a15-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for simd_r_drive_py-0.4.1a15-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 21f4d9b90dfb2a2b0c3dd93f62d7369b5a432af85b33cceb07011f7a08b81e51
MD5 01772e4253f36b08c95ee0d3610dc722
BLAKE2b-256 db1451b5681a025839a27da2ae4262036e62e2d24d54f21b09d3f77741515a63

See more details on using hashes here.

File details

Details for the file simd_r_drive_py-0.4.1a15-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for simd_r_drive_py-0.4.1a15-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 07d10a8ecbae90ae91a559297bfa019eb2a45ecf59c36049b4477ca2d9c3d343
MD5 fddc22292f11eca623e3279ad445d613
BLAKE2b-256 83dcfb774b7409fc46d8df4f6cdd1fe68ff808aeaf89e24ecb3ac553dc8e0f38

See more details on using hashes here.

File details

Details for the file simd_r_drive_py-0.4.1a15-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for simd_r_drive_py-0.4.1a15-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 69ea46762bc8fef1679f86e296a9a2c0a220d2e639328263134253fdf44404c1
MD5 56341d28e0424968e671ee6e760ad545
BLAKE2b-256 560990e1387d4ad57f9b938e14407202420756847be0107f6229aaa706e7a86e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page