Skip to main content

SIMD-optimized append-only schema-less storage engine. Key-based binary storage in a single-file storage container.

Project description

SIMD R Drive (Python Bindings)

Experimental Python bindings for SIMD R Drive — a high-performance, schema-less storage engine using a single-file storage container optimized for zero-copy binary access, written in Rust.

This library provides access to core functionality of simd-r-drive from Python, including high-performance key/value storage, zero-copy reads via memoryview, and support for streaming writes and reads.

Threaded streaming writes from Python are not supported. See Thread Safety for important limitations.

Features

  • 🔑 Append-only key/value storage
  • ⚡ Zero-copy reads via memoryview and mmap
  • 📆 Single-file binary container (no schema or serialization required)
  • ↺ Streaming interface for writing and reading large entries
  • 🐍 Native Rust extension module for Python (via PyO3)

Supported Environments

The simd_r_drive_py Python bindings are built as native extension modules and require environments that support both Python and Rust toolchains.

✅ Platforms

  • Linux (x86_64, aarch64)
  • macOS (x86_64, arm64/M1/M2)

Wheels are built using cibuildwheel and tested on GitHub Actions.

✅ Supported Python Versions

  • Python 3.10 – 3.13

Older versions (≤3.9) are explicitly skipped during wheel builds.

❌ Not Supported

  • Windows (x86_64, AMD64, ARM64) Python bindings are not officially supported on Windows due to platform-specific filesystem and memory-mapping inconsistencies in the Python runtime.

    The underlying Rust library works on Windows and is tested continuously, but the Python bindings fail some unit tests in CI. Manual builds (including AMD64 and ARM64) have succeeded locally but are not considered production-stable.

  • Python < 3.10
  • 32-bit Python
  • musl-based Linux environments (e.g., Alpine Linux)
  • PyPy or other alternative Python interpreters

If you need support for other environments or interpreters, consider compiling from source with maturin develop inside a compatible environment.

Storage Layout

Offset Range Field Size (Bytes) Description
0 → N Payload N Variable-length data
N → N + 8 Key Hash 8 64-bit XXH3 hash of the key (fast lookups)
N + 8 → N + 16 Prev Offset 8 Absolute offset pointing to the previous version
N + 16 → N + 20 Checksum 4 32-bit CRC32C checksum for integrity verification

Installation

pip install -i simd-r-drive-py

Usage

Regular Writes and Reads

from simd_r_drive import DataStore

# Create or open a datastore
store = DataStore("mydata.bin")

# Write a key/value pair
store.write(b"username", b"jdoe")

# Read the value
value = store.read(b"username")
print(value)  # b'jdoe'

# Check existence
assert store.exists(b"username")

# Delete the key
store.delete(b"username")
assert store.read(b"username") is None

Batch Writes

from simd_r_drive import DataStore

store = DataStore("batch.bin")

# Prepare entries as a list of (key, value) byte tuples
entries = [
    (b"user:1", b"alice"),
    (b"user:2", b"bob"),
    (b"user:3", b"charlie"),
]

# Write all entries in a single batch
store.batch_write(entries)

# Verify that all entries were written correctly
for key, value in entries:
    assert store.read(key) == value

Streamed Writes and Reads (Large Payloads)

from simd_r_drive import DataStore
import io

store = DataStore("streamed.bin")

# Simulated payload — in practice, this could be any file-like stream,
# including one that does not fit entirely into memory.
payload = b"x" * (10 * 1024 * 1024)  # Example: 10 MB of dummy data
stream = io.BytesIO(payload)

store.write_stream(b"large-file", stream)

# Read the payload back in chunks
read_stream = store.read_stream(b"large-file")
result = bytearray()

for chunk in read_stream:
    result.extend(chunk)

assert result == payload

API

DataStore(path: str)

Opens (or creates) a file-backed storage container at the given path.

.write(key: bytes, value: bytes) -> None

Atomically appends a new key-value entry. Overwrites any previous version of the key.

.batch_write(items: List[Tuple[bytes, bytes]]) -> None

Writes multiple key-value pairs in a single operation. Each item must be a tuple of (key, value) where both are bytes.

.write_stream(key: bytes, reader: IO[bytes]) -> None

Streams from a Python file-like object (.read(n) interface). Not thread-safe.

.read(key: bytes) -> Optional[bytes]

Returns the full value for a key, or None if the key does not exist.

.read_entry(key: bytes) -> Optional[EntryHandle]

Returns a memory-mapped handle, exposing .as_memoryview() for zero-copy access.

.read_stream(key: bytes) -> Optional[EntryStream]

Returns a streaming reader exposing .read(n).

.delete(key: bytes) -> None

Marks an entry as deleted and no longer available to be read. The file remains append-only; use Rust-side compaction if needed.

.exists(key: bytes) -> bool

Returns whether a key is currently valid in the index.

Thread Safety

This Python binding is not thread-safe.

Due to Python’s Global Interpreter Lock (GIL) and the limitations of PyO3, concurrent streaming writes or reads from multiple threads are not supported, and doing so may cause hangs or inconsistent behavior.

  • Use only from a single thread.
  • ❌ Do not call methods like write_stream or read_stream from multiple threads.
  • ❌ Do not share a DataStore instance across threads.
  • ✅ For concurrent, high-performance use — especially with streaming — use the native Rust version directly.

This design avoids working around the GIL or spawning internal locks for artificial concurrency. If you need reliable multithreading, call into the Rust API instead.

Limitations

  • Python bindings currently lack async support.
  • write_stream is blocking and not safe for concurrent use.
  • Compaction is not yet exposed via Python.
  • This is not a drop-in database — you're expected to manage your own data formats.

Development

To develop and test the Python bindings:

Requirements

  • Python 3.10 or above
  • Rust toolchain (with cargo)
pip install -r requirements.txt -r requirements-dev.txt

Test Changes

maturin develop # Builds the Rust library
pytest # Tests the Python integration

Build a Release Wheel

maturin build --release
pip install dist/simd_r_drive_py-*.whl

License

Licensed under the Apache-2.0 License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

simd_r_drive_py-0.4.1a5-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (336.3 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

simd_r_drive_py-0.4.1a5-cp313-cp313-macosx_11_0_arm64.whl (301.0 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

simd_r_drive_py-0.4.1a5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (336.4 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

simd_r_drive_py-0.4.1a5-cp312-cp312-macosx_11_0_arm64.whl (301.0 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

simd_r_drive_py-0.4.1a5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (336.8 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

simd_r_drive_py-0.4.1a5-cp311-cp311-macosx_11_0_arm64.whl (304.3 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

simd_r_drive_py-0.4.1a5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (337.1 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

simd_r_drive_py-0.4.1a5-cp310-cp310-macosx_11_0_arm64.whl (304.4 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

File details

Details for the file simd_r_drive_py-0.4.1a5-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for simd_r_drive_py-0.4.1a5-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 da94c2a5866effe575bb5b19f19b9dd65a2b3e1ee6db230898a656deaefcf0ee
MD5 7720351282110be648ae33e61b268e57
BLAKE2b-256 9a08c80d720e4d4abbf6555e39dab1d43781895046d2065a9227c9318bf5f5e7

See more details on using hashes here.

File details

Details for the file simd_r_drive_py-0.4.1a5-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for simd_r_drive_py-0.4.1a5-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 cad1bfdd9a630ff4ac8afecfd660540409ca8628536064387e0f5f6f483e6985
MD5 d542e1cc5b84ce45a002d56f12bd9a35
BLAKE2b-256 8634f02fe6b25fdcf58af5fb1a2c1c24a41389b593cea8209af86867dcc1b447

See more details on using hashes here.

File details

Details for the file simd_r_drive_py-0.4.1a5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for simd_r_drive_py-0.4.1a5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e9963ff3264218fc2fc1e34134a7ce268a209b978e5318ef4f9f5a0ad67c30f7
MD5 789f54423e5b288c06a648eb72929eb0
BLAKE2b-256 997bc227e4187da9e198c88f7c112529fb103fb7210cd0ebaf46e7b6bb3356d3

See more details on using hashes here.

File details

Details for the file simd_r_drive_py-0.4.1a5-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for simd_r_drive_py-0.4.1a5-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 605ddf3336609591c43034ce9e3dfca3893a0236fd165ceae3f52d3d96e8d742
MD5 a13620cf388813b9850efaac18b7b85b
BLAKE2b-256 35ab390d16840c560aae220da28f1933693a916918e26bbf6c822f49d515ad8e

See more details on using hashes here.

File details

Details for the file simd_r_drive_py-0.4.1a5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for simd_r_drive_py-0.4.1a5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 0fc089feb32ef2a3c4c84a5b652bd7734d26c7c7bf28bd81031d497eb1bd1c49
MD5 003974337e6b8f14a445fddf30c090cc
BLAKE2b-256 c086c545491e7ff861e1e37242eb30eef079bde94b12166340c92ca245feb071

See more details on using hashes here.

File details

Details for the file simd_r_drive_py-0.4.1a5-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for simd_r_drive_py-0.4.1a5-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 326d37e0a3548b481be80baea14ffffbe49d44aff07dd8fa517bed31bfc08200
MD5 c45ca1f4efeea47f8fca037a73dff019
BLAKE2b-256 33df55ebe022d78ad5889b14385f3c6da6ba30d99bc224ba071c4c616d4d042b

See more details on using hashes here.

File details

Details for the file simd_r_drive_py-0.4.1a5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for simd_r_drive_py-0.4.1a5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ebdb0ec0df91a8a7686722976646410c46e4618a708f34b5d127e4527e6f8602
MD5 9ca81c5ec8342c4e0bdfac993c1e59c0
BLAKE2b-256 09383f34dfbbdbcc4fac1d4fefb772019c15bd4ad6bb769bfc7f1440e018a40d

See more details on using hashes here.

File details

Details for the file simd_r_drive_py-0.4.1a5-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for simd_r_drive_py-0.4.1a5-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c9af881b77a4a5957438c58d81cbfef4193ecf201827458b7f4c0ac1ffdd5b91
MD5 da20c3d2beafa5ff20c4383b833b882f
BLAKE2b-256 65e883cb705e5cdb132d2d500f86195b240eb397ea0ab042cd836318a4381924

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page