Stable, high-performance KVCache for LLM inference, with decentralized coordination, memory-pool storage, and multi-tier caching across hosts.

Project description

redhare (Python connector)

In-process Python binding around RedhareClient. Ships two API layers that share the same embedded Rust client:

KvStore — low-level SDK (this README).
redhare.vllm_connector.RedhareConnector — vLLM v1 native KV connector. Full schema and mode-picking guide: docs/08-vllm-connector.md.

The low-level SDK below exposes batch gather/scatter methods for custom connector implementations:

from redhare import KvStore

store = KvStore(
    redis_url="redis://127.0.0.1:6379",
    client_id="rank0",
    data_plane_addr="127.0.0.1:7000",
    capacity="64MB",              # int bytes also accepted
    transport="nixl",            # "nixl" (default) | "tcp" | "inmemory"
    shared_fs_root=None,          # optional cold-tier path
    shared_fs_backend="posix",    # "posix" (default) | "gds"
    block_size_bytes=None,        # optional fixed per-key block size
    hot_cache_fraction=0.20,
    hot_cache_min_shared_fs_reads=2,
    cache_node_rpc_addr=None,     # optional: enable cache-node mode
    enable_remote_dram=False,     # originator: place into discovered cache-nodes
)

# Pre-register the KV cache region once; later reads DMA straight into it
# without re-registering per call.
store.register_buffer(kv_cache.data_ptr(), kv_cache.numel() * kv_cache.element_size())

# Save: each key's payload is gathered from N (addr, size) pairs (one memcpy).
rc = store.batch_put_from_multi_buffers(keys, addrs_per_key, sizes_per_key)
# Load: NIXL scatters the payload across the same N (addr, size) pairs.
# True zero-copy when destinations live inside a register_buffer region.
rc = store.batch_get_into_multi_buffers(keys, addrs_per_key, sizes_per_key)

# Experimental local-only load: enqueue the scatter copy on Redhare's CUDA
# copy stream and poll or wait for completion. This raises if any key is not
# local, so callers should fall back to batch_get_into_multi_buffers.
handle = store.batch_get_into_multi_buffers_submit_local(
    keys, addrs_per_key, sizes_per_key
)
# Either poll handle.is_done() from your scheduler loop, or block explicitly.
handle.wait()

# Existence check (1 = exists, 0 = missing, -1 = error per key)
rc = store.batch_is_exist(keys)

store.close()

Return convention uses per-key int lists where 0 is success and a negative value indicates failure.

Build

pip install maturin                          # once
cd crates/redhare-py
export CARGO_TARGET_DIR=/tmp/redhare-target   # NFS-safe (see project memory)
maturin develop --release                    # installs into the active env

A wheel is built with maturin build --release; install via pip install target/wheels/redhare-*.whl.

Notes

transport="nixl" is the only one with true zero-copy load. tcp and inmemory work but fall back to read-then-memcpy for scatters.
batch_get_into_multi_buffers_submit_local() is a local-only optimization for custom connectors. It does not read from remote peers or cold storage; use batch_get_into_multi_buffers() as the fallback path.
data_plane_addr is what peers connect to (NIXL control channel or TCP). Must be reachable from other clients in the cluster.
Remote DRAM uses data_plane_addr for payload movement and cache_node_rpc_addr only for control. With transport="nixl", remote writes use NIXL Write into the cache-node arena after an RPC reservation.
One connector key maps to one Redhare object: the per-key (addrs, sizes) lists are concatenated on save (one memcpy) and scattered on load (zero copy).
With block_size_bytes=..., each key is forced to one fixed-size block: shorter saves are zero-padded, larger saves fail.

Project details

Release history Release notifications | RSS feed

0.4.1rc0 pre-release

Jun 26, 2026

This version

0.3.1

Jun 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

redhare-0.3.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.8 MB view details)

Uploaded Jun 23, 2026 CPython 3.12manylinux: glibc 2.17+ x86-64

File details

Details for the file redhare-0.3.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

Download URL: redhare-0.3.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Upload date: Jun 23, 2026
Size: 3.8 MB
Tags: CPython 3.12, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for redhare-0.3.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`294159ff6fb3097d3dbe7039f4945eec51efa9ae8f9bd30a99872e8c59840145`
MD5	`ed895cb0270ae105c84b18a7e4facf76`
BLAKE2b-256	`a1ba28868f634cd5cb50cacf1b7fd8459fda5224e5a1be9c874160a1fe44e681`

See more details on using hashes here.

redhare 0.3.1

Navigation

Verified details

Maintainers

Unverified details

Meta