Stable, high-performance KVCache for LLM inference, with decentralized coordination, memory-pool storage, and multi-tier caching across hosts.
Project description
redhare (Python connector)
In-process Python binding around RedhareClient. Ships two API layers
that share the same embedded Rust client:
KvStore— low-level SDK (this README).redhare.vllm_connector.RedhareConnector— vLLM v1 native KV connector. Full schema and mode-picking guide:docs/08-vllm-connector.md.
The low-level SDK below exposes batch gather/scatter methods for custom connector implementations:
from redhare import KvStore
store = KvStore(
redis_url="redis://127.0.0.1:6379",
client_id="rank0",
data_plane_addr="127.0.0.1:7000",
capacity="64MB", # int bytes also accepted
transport="nixl", # "nixl" (default) | "tcp" | "inmemory"
shared_fs_root=None, # optional cold-tier path
shared_fs_backend="posix", # "posix" (default) | "gds"
block_size_bytes=None, # optional fixed per-key block size
hot_cache_fraction=0.20,
hot_cache_min_shared_fs_reads=2,
cache_node_rpc_addr=None, # optional: enable cache-node mode
enable_remote_dram=False, # originator: place into discovered cache-nodes
)
# Pre-register the KV cache region once; later reads DMA straight into it
# without re-registering per call.
store.register_buffer(kv_cache.data_ptr(), kv_cache.numel() * kv_cache.element_size())
# Save: each key's payload is gathered from N (addr, size) pairs (one memcpy).
rc = store.batch_put_from_multi_buffers(keys, addrs_per_key, sizes_per_key)
# Load: NIXL scatters the payload across the same N (addr, size) pairs.
# True zero-copy when destinations live inside a register_buffer region.
rc = store.batch_get_into_multi_buffers(keys, addrs_per_key, sizes_per_key)
# Experimental local-only load: enqueue the scatter copy on Redhare's CUDA
# copy stream and poll or wait for completion. This raises if any key is not
# local, so callers should fall back to batch_get_into_multi_buffers.
handle = store.batch_get_into_multi_buffers_submit_local(
keys, addrs_per_key, sizes_per_key
)
# Either poll handle.is_done() from your scheduler loop, or block explicitly.
handle.wait()
# Existence check (1 = exists, 0 = missing, -1 = error per key)
rc = store.batch_is_exist(keys)
store.close()
Return convention uses per-key int lists where 0 is success and a negative
value indicates failure.
Build
pip install maturin # once
cd crates/redhare-py
export CARGO_TARGET_DIR=/tmp/redhare-target # NFS-safe (see project memory)
maturin develop --release # installs into the active env
A wheel is built with maturin build --release; install via pip install target/wheels/redhare-*.whl.
Notes
transport="nixl"is the only one with true zero-copy load.tcpandinmemorywork but fall back to read-then-memcpy for scatters.batch_get_into_multi_buffers_submit_local()is a local-only optimization for custom connectors. It does not read from remote peers or cold storage; usebatch_get_into_multi_buffers()as the fallback path.data_plane_addris what peers connect to (NIXL control channel or TCP). Must be reachable from other clients in the cluster.- Remote DRAM uses
data_plane_addrfor payload movement andcache_node_rpc_addronly for control. Withtransport="nixl", remote writes use NIXL Write into the cache-node arena after an RPC reservation. - One connector
keymaps to one Redhare object: the per-key(addrs, sizes)lists are concatenated on save (one memcpy) and scattered on load (zero copy). - With
block_size_bytes=..., each key is forced to one fixed-size block: shorter saves are zero-padded, larger saves fail.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file redhare-0.3.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: redhare-0.3.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 3.8 MB
- Tags: CPython 3.12, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
294159ff6fb3097d3dbe7039f4945eec51efa9ae8f9bd30a99872e8c59840145
|
|
| MD5 |
ed895cb0270ae105c84b18a7e4facf76
|
|
| BLAKE2b-256 |
a1ba28868f634cd5cb50cacf1b7fd8459fda5224e5a1be9c874160a1fe44e681
|