Skip to main content

SIMD-optimized append-only schema-less storage engine. Key-based binary storage in a single-file storage container.

Project description

SIMD R Drive (Python Bindings)

Experimental Python bindings for SIMD R Drive — a high-performance, schema-less storage engine using a single-file storage container optimized for zero-copy binary access, written in Rust.

This library provides access to core functionality of simd-r-drive from Python, including high-performance key/value storage, zero-copy reads via memoryview, and support for streaming writes and reads.

Threaded streaming writes from Python are not supported. See Thread Safety for important limitations.

Features

  • 🔑 Append-only key/value storage
  • ⚡ Zero-copy reads via memoryview and mmap
  • 📆 Single-file binary container (no schema or serialization required)
  • ↺ Streaming interface for writing and reading large entries
  • 🐍 Native Rust extension module for Python (via PyO3)

Supported Environments

The simd_r_drive_py Python bindings are built as native extension modules and require environments that support both Python and Rust toolchains.

✅ Platforms

  • Linux (x86_64, aarch64)
  • macOS (x86_64, arm64/M1/M2)

Wheels are built using cibuildwheel and tested on GitHub Actions.

✅ Supported Python Versions

  • Python 3.10 – 3.13 (CPython) – Supported on CPython only.

Older versions (≤3.9) are explicitly skipped during wheel builds.

❌ Not Supported

  • Windows (x86_64, AMD64, ARM64) Python bindings are not officially supported on Windows due to platform-specific filesystem and memory-mapping inconsistencies in the Python runtime.

    The underlying Rust library works on Windows and is tested continuously, but the Python bindings fail some unit tests in CI. Manual builds (including AMD64 and ARM64) have succeeded locally but are not considered production-stable.

  • Python < 3.10
  • 32-bit Python
  • musl-based Linux environments (e.g., Alpine Linux)
  • PyPy or other alternative Python interpreters

If you need support for other environments or interpreters, consider compiling from source with maturin develop inside a compatible environment.

Storage Layout

Offset Range Field Size (Bytes) Description
0 → N Payload N Variable-length data
N → N + 8 Key Hash 8 64-bit XXH3 hash of the key (fast lookups)
N + 8 → N + 16 Prev Offset 8 Absolute offset pointing to the previous version
N + 16 → N + 20 Checksum 4 32-bit CRC32C checksum for integrity verification

Installation

pip install -i simd-r-drive-py

Usage

Regular Writes and Reads

from simd_r_drive import DataStore

# Create or open a datastore
store = DataStore("mydata.bin")

# Write a key/value pair
store.write(b"username", b"jdoe")

# Read the value
value = store.read(b"username")
print(value)  # b'jdoe'

# Check existence
assert store.exists(b"username")

# Delete the key
store.delete(b"username")
assert store.read(b"username") is None

Batch Writes

from simd_r_drive import DataStore

store = DataStore("batch.bin")

# Prepare entries as a list of (key, value) byte tuples
entries = [
    (b"user:1", b"alice"),
    (b"user:2", b"bob"),
    (b"user:3", b"charlie"),
]

# Write all entries in a single batch
store.batch_write(entries)

# Verify that all entries were written correctly
for key, value in entries:
    assert store.read(key) == value

Streamed Writes and Reads (Large Payloads)

from simd_r_drive import DataStore
import io

store = DataStore("streamed.bin")

# Simulated payload — in practice, this could be any file-like stream,
# including one that does not fit entirely into memory.
payload = b"x" * (10 * 1024 * 1024)  # Example: 10 MB of dummy data
stream = io.BytesIO(payload)

store.write_stream(b"large-file", stream)

# Read the payload back in chunks
read_stream = store.read_stream(b"large-file")
result = bytearray()

for chunk in read_stream:
    result.extend(chunk)

assert result == payload

API

DataStore(path: str)

Opens (or creates) a file-backed storage container at the given path.

.write(key: bytes, value: bytes) -> None

Atomically appends a new key-value entry. Overwrites any previous version of the key.

.batch_write(items: List[Tuple[bytes, bytes]]) -> None

Writes multiple key-value pairs in a single operation. Each item must be a tuple of (key, value) where both are bytes.

.write_stream(key: bytes, reader: IO[bytes]) -> None

Streams from a Python file-like object (.read(n) interface). Not thread-safe.

.read(key: bytes) -> Optional[bytes]

Returns the full value for a key, or None if the key does not exist.

.read_entry(key: bytes) -> Optional[EntryHandle]

Returns a memory-mapped handle, exposing .as_memoryview() for zero-copy access.

.read_stream(key: bytes) -> Optional[EntryStream]

Returns a streaming reader exposing .read(n).

.delete(key: bytes) -> None

Marks an entry as deleted and no longer available to be read. The file remains append-only; use Rust-side compaction if needed.

.exists(key: bytes) -> bool

Returns whether a key is currently valid in the index.

Thread Safety

This Python binding is not thread-safe.

Due to Python’s Global Interpreter Lock (GIL) and the limitations of PyO3, concurrent streaming writes or reads from multiple threads are not supported, and doing so may cause hangs or inconsistent behavior.

  • Use only from a single thread.
  • ❌ Do not call methods like write_stream or read_stream from multiple threads.
  • ❌ Do not share a DataStore instance across threads.
  • ✅ For concurrent, high-performance use — especially with streaming — use the native Rust version directly.

This design avoids working around the GIL or spawning internal locks for artificial concurrency. If you need reliable multithreading, call into the Rust API instead.

Limitations

  • Python bindings currently lack async support.
  • write_stream is blocking and not safe for concurrent use.
  • Compaction is not yet exposed via Python.
  • This is not a drop-in database — you're expected to manage your own data formats.

Development

To develop and test the Python bindings:

Requirements

  • Python 3.10 or above
  • Rust toolchain (with cargo)
pip install -r requirements.txt -r requirements-dev.txt

Test Changes

maturin develop # Builds the Rust library
pytest # Tests the Python integration

Build a Release Wheel

maturin build --release
pip install dist/simd_r_drive_py-*.whl

License

Licensed under the Apache-2.0 License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

simd_r_drive_py-0.5.0a0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (345.6 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

simd_r_drive_py-0.5.0a0-cp313-cp313-macosx_11_0_arm64.whl (308.3 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

simd_r_drive_py-0.5.0a0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (345.9 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

simd_r_drive_py-0.5.0a0-cp312-cp312-macosx_11_0_arm64.whl (308.2 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

simd_r_drive_py-0.5.0a0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (345.8 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

simd_r_drive_py-0.5.0a0-cp311-cp311-macosx_11_0_arm64.whl (311.4 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

simd_r_drive_py-0.5.0a0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (345.8 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

simd_r_drive_py-0.5.0a0-cp310-cp310-macosx_11_0_arm64.whl (311.4 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

File details

Details for the file simd_r_drive_py-0.5.0a0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for simd_r_drive_py-0.5.0a0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b183633c7358ed1833f7302d5954dd791ae7d9b5875200782fc176cf384d4210
MD5 f1431b824932116682105b44b6723010
BLAKE2b-256 8f11d836061ac637fef1344a72b205faed479c5ca145c7dc39ecf796339f34bf

See more details on using hashes here.

File details

Details for the file simd_r_drive_py-0.5.0a0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for simd_r_drive_py-0.5.0a0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5500adf7d88faba77941a179f776290777fd4bbc3c23cacadaa384d01aef819c
MD5 260526bd0c74e5beb231f4f6f61b6c30
BLAKE2b-256 70f80f61f679c50d67ee8414e02f0e3327c4649fb2e8f2ba147ed5d2e4a0ba8e

See more details on using hashes here.

File details

Details for the file simd_r_drive_py-0.5.0a0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for simd_r_drive_py-0.5.0a0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9285610d762aecb0e84a127cf68defc0c6fd507e444c515f6dd62c3c77960d89
MD5 58ab2e3ae77da775b538a2b643eccfe4
BLAKE2b-256 68a84b346c3dd0e16462ca75fb4a9e2c37a8fbae11fa03c7e35e9027241b7467

See more details on using hashes here.

File details

Details for the file simd_r_drive_py-0.5.0a0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for simd_r_drive_py-0.5.0a0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9b079d86e414a5e3ae14f093ad9c0b359719cbf133b08849815f41771b800a30
MD5 60788d86ba101c6103256f5fae3feffe
BLAKE2b-256 0a225583c0ba912b27462c20049ce4291ddb5820a2575e962e97b00b9a5d4599

See more details on using hashes here.

File details

Details for the file simd_r_drive_py-0.5.0a0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for simd_r_drive_py-0.5.0a0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8e519ad437e28b50b4a869d0925282ddfe1911b55260472053acaa2ade1f17f8
MD5 337bf3330b24fa7344473444c962fb08
BLAKE2b-256 cedaf59777499ad4054e05f71e3f12936cece65da2047a04432c44f657b4a242

See more details on using hashes here.

File details

Details for the file simd_r_drive_py-0.5.0a0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for simd_r_drive_py-0.5.0a0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6df4e50b239d8907e8dda4126d24024a0d2e8a2a5d3fbe7721539106649f538c
MD5 ec7ed06dcb8999198dda3ff96a495799
BLAKE2b-256 97616bd9473e5c09755ef8f0b524129ca24f361ae7772c17bce40632cffecde1

See more details on using hashes here.

File details

Details for the file simd_r_drive_py-0.5.0a0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for simd_r_drive_py-0.5.0a0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1ced7f0275c3dd91718b9d9f31e5a1c9496faf16511f62126f584b09154a6f29
MD5 611e5a47fd414041dd03bc1c95181d61
BLAKE2b-256 04fec215e8f1bdf5ba85a0dd758d0492bdd969ab8be0356a8b6ea74e2d719239

See more details on using hashes here.

File details

Details for the file simd_r_drive_py-0.5.0a0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for simd_r_drive_py-0.5.0a0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4d3c81dcc2452ee306230b31af37a984e3eafd7eac7e5f69d2cb21239f80bb0e
MD5 cb3266ed86fed81e9fab352106f75c33
BLAKE2b-256 fd2fb3c6838230a4392e5374cd4d74b743e00f83b4df69b11a652aa83a9293ce

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page