Skip to main content

SIMD-optimized append-only schema-less storage engine. Key-based binary storage in a single-file storage container.

Project description

SIMD R Drive (Python Bindings)

Experimental Python bindings for SIMD R Drive — a high-performance, schema-less storage engine using a single-file storage container optimized for zero-copy binary access, written in Rust.

This library provides access to core functionality of simd-r-drive from Python, including high-performance key/value storage, zero-copy reads via memoryview, and support for streaming writes and reads.

Threaded streaming writes from Python are not supported. See Thread Safety for important limitations.

Features

  • 🔑 Append-only key/value storage
  • ⚡ Zero-copy reads via memoryview and mmap
  • 📆 Single-file binary container (no schema or serialization required)
  • ↺ Streaming interface for writing and reading large entries
  • 🐍 Native Rust extension module for Python (via PyO3)

Supported Environments

The simd_r_drive_py Python bindings are built as native extension modules and require environments that support both Python and Rust toolchains.

✅ Platforms

  • Linux (x86_64, aarch64)
  • macOS (x86_64, arm64/M1/M2)

Wheels are built using cibuildwheel and tested on GitHub Actions.

✅ Supported Python Versions

  • Python 3.10 – 3.13 (CPython) – Supported on CPython only.

Older versions (≤3.9) are explicitly skipped during wheel builds.

❌ Not Supported

  • Windows (x86_64, AMD64, ARM64) Python bindings are not officially supported on Windows due to platform-specific filesystem and memory-mapping inconsistencies in the Python runtime.

    The underlying Rust library works on Windows and is tested continuously, but the Python bindings fail some unit tests in CI. Manual builds (including AMD64 and ARM64) have succeeded locally but are not considered production-stable.

  • Python < 3.10
  • 32-bit Python
  • musl-based Linux environments (e.g., Alpine Linux)
  • PyPy or other alternative Python interpreters

If you need support for other environments or interpreters, consider compiling from source with maturin develop inside a compatible environment.

Storage Layout

Offset Range Field Size (Bytes) Description
0 → N Payload N Variable-length data
N → N + 8 Key Hash 8 64-bit XXH3 hash of the key (fast lookups)
N + 8 → N + 16 Prev Offset 8 Absolute offset pointing to the previous version
N + 16 → N + 20 Checksum 4 32-bit CRC32C checksum for integrity verification

Installation

pip install -i simd-r-drive-py

Usage

Regular Writes and Reads

from simd_r_drive import DataStore

# Create or open a datastore
store = DataStore("mydata.bin")

# Write a key/value pair
store.write(b"username", b"jdoe")

# Read the value
value = store.read(b"username")
print(value)  # b'jdoe'

# Check existence
assert store.exists(b"username")

# Delete the key
store.delete(b"username")
assert store.read(b"username") is None

Batch Writes

from simd_r_drive import DataStore

store = DataStore("batch.bin")

# Prepare entries as a list of (key, value) byte tuples
entries = [
    (b"user:1", b"alice"),
    (b"user:2", b"bob"),
    (b"user:3", b"charlie"),
]

# Write all entries in a single batch
store.batch_write(entries)

# Verify that all entries were written correctly
for key, value in entries:
    assert store.read(key) == value

Streamed Writes and Reads (Large Payloads)

from simd_r_drive import DataStore
import io

store = DataStore("streamed.bin")

# Simulated payload — in practice, this could be any file-like stream,
# including one that does not fit entirely into memory.
payload = b"x" * (10 * 1024 * 1024)  # Example: 10 MB of dummy data
stream = io.BytesIO(payload)

store.write_stream(b"large-file", stream)

# Read the payload back in chunks
read_stream = store.read_stream(b"large-file")
result = bytearray()

for chunk in read_stream:
    result.extend(chunk)

assert result == payload

API

DataStore(path: str)

Opens (or creates) a file-backed storage container at the given path.

.write(key: bytes, value: bytes) -> None

Atomically appends a new key-value entry. Overwrites any previous version of the key.

.batch_write(items: List[Tuple[bytes, bytes]]) -> None

Writes multiple key-value pairs in a single operation. Each item must be a tuple of (key, value) where both are bytes.

.write_stream(key: bytes, reader: IO[bytes]) -> None

Streams from a Python file-like object (.read(n) interface). Not thread-safe.

.read(key: bytes) -> Optional[bytes]

Returns the full value for a key, or None if the key does not exist.

.read_entry(key: bytes) -> Optional[EntryHandle]

Returns a memory-mapped handle, exposing .as_memoryview() for zero-copy access.

.read_stream(key: bytes) -> Optional[EntryStream]

Returns a streaming reader exposing .read(n).

.delete(key: bytes) -> None

Marks an entry as deleted and no longer available to be read. The file remains append-only; use Rust-side compaction if needed.

.exists(key: bytes) -> bool

Returns whether a key is currently valid in the index.

Thread Safety

This Python binding is not thread-safe.

Due to Python’s Global Interpreter Lock (GIL) and the limitations of PyO3, concurrent streaming writes or reads from multiple threads are not supported, and doing so may cause hangs or inconsistent behavior.

  • Use only from a single thread.
  • ❌ Do not call methods like write_stream or read_stream from multiple threads.
  • ❌ Do not share a DataStore instance across threads.
  • ✅ For concurrent, high-performance use — especially with streaming — use the native Rust version directly.

This design avoids working around the GIL or spawning internal locks for artificial concurrency. If you need reliable multithreading, call into the Rust API instead.

Limitations

  • Python bindings currently lack async support.
  • write_stream is blocking and not safe for concurrent use.
  • Compaction is not yet exposed via Python.
  • This is not a drop-in database — you're expected to manage your own data formats.

Development

To develop and test the Python bindings:

Requirements

  • Python 3.10 or above
  • Rust toolchain (with cargo)
pip install -r requirements.txt -r requirements-dev.txt

Test Changes

maturin develop # Builds the Rust library
pytest # Tests the Python integration

Build a Release Wheel

maturin build --release
pip install dist/simd_r_drive_py-*.whl

License

Licensed under the Apache-2.0 License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

simd_r_drive_py-0.4.1a14-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (340.8 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

simd_r_drive_py-0.4.1a14-cp313-cp313-macosx_11_0_arm64.whl (305.4 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

simd_r_drive_py-0.4.1a14-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (340.9 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

simd_r_drive_py-0.4.1a14-cp312-cp312-macosx_11_0_arm64.whl (305.4 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

simd_r_drive_py-0.4.1a14-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (341.4 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

simd_r_drive_py-0.4.1a14-cp311-cp311-macosx_11_0_arm64.whl (308.7 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

simd_r_drive_py-0.4.1a14-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (341.7 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

simd_r_drive_py-0.4.1a14-cp310-cp310-macosx_11_0_arm64.whl (308.8 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

File details

Details for the file simd_r_drive_py-0.4.1a14-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for simd_r_drive_py-0.4.1a14-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 65c92027c95932665f737518819917f30e9c894403a72b85c08df424fdc8bcd3
MD5 aa96b6af62db780f7b3c9919baf95338
BLAKE2b-256 1deaddce4471da6fdfe78185dc38d66c0f4ef0055f66a4d8cb8a657bdd4213e5

See more details on using hashes here.

File details

Details for the file simd_r_drive_py-0.4.1a14-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for simd_r_drive_py-0.4.1a14-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d9561fdcc465a8db0ea4b0aecefaec19a1712ec13a5c2a12e5a39725053a540f
MD5 65cf66ba2cb7a4c76790301f3cf18e4b
BLAKE2b-256 cd45787a279f7c431569c25d73e02e5e2c27a391f5f569530765ec9d27c2f04d

See more details on using hashes here.

File details

Details for the file simd_r_drive_py-0.4.1a14-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for simd_r_drive_py-0.4.1a14-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f894b0332a1fc3e0aa60f56017c823610b0312b6bb38146e1c1d0cb77ca034a1
MD5 9204f85f28cb4909afd32dc44eb64a85
BLAKE2b-256 c3dc99d5a7fa34da5660897b6b01856abbacb5492e8d581533f9164f48a06be8

See more details on using hashes here.

File details

Details for the file simd_r_drive_py-0.4.1a14-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for simd_r_drive_py-0.4.1a14-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9749df739e5a22d6ae0de5863aba99275af501c438f93b0cdd03c165ac50ff0b
MD5 56291731e58552b657faa968ae42e06a
BLAKE2b-256 30c5bbce4e5be17aaf225bd8d506503c7c67f197a6552a2fa55572d55bf76adf

See more details on using hashes here.

File details

Details for the file simd_r_drive_py-0.4.1a14-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for simd_r_drive_py-0.4.1a14-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 7112105830f63290595e7ee32e40ce7dd26339aabc59e9f4a73f3fce71e0de49
MD5 fcbaf87ef21f9901488269d2945dc9a7
BLAKE2b-256 116615256d7340d4863b797953d34938333af2003c781d8a9d5034d7c2aab2e6

See more details on using hashes here.

File details

Details for the file simd_r_drive_py-0.4.1a14-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for simd_r_drive_py-0.4.1a14-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 0601d9d1196c260c9fa00ae7406a30eb23805c658ea0e6d9f1a24c25e2bf76e4
MD5 eb21c35088299034928480d988132309
BLAKE2b-256 27615833e7b6fdabab2c63a6823e1137e63a896daa8c2b7b388830ca6bf40b05

See more details on using hashes here.

File details

Details for the file simd_r_drive_py-0.4.1a14-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for simd_r_drive_py-0.4.1a14-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 14a6cc78d2cfff098a5b9db6f73253f25658868c3df7ebd630f91f9e5d7fdc20
MD5 b085006ae3c807296d58ceb1ebb96ede
BLAKE2b-256 b6e89e7ffd3ceba7698eb80a08fd409e6c884864cafdbcf517aa82820d840bbd

See more details on using hashes here.

File details

Details for the file simd_r_drive_py-0.4.1a14-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for simd_r_drive_py-0.4.1a14-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1489309c5f49b6b5122319403387aee73a3f09b104262aed6d6f3b4e249f1daa
MD5 976b5f0b396b635922a36c89c9370520
BLAKE2b-256 9c16df2a3712ae2e37ecc07e8462605fadff13b186f12fa0823f3d26fba30534

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page