Skip to main content

Stable, high-performance KVCache for LLM inference, with decentralized coordination, memory-pool storage, and multi-tier caching across hosts.

Project description

redhare (Python connector)

In-process Python binding around RedhareClient. Ships two API layers that share the same embedded Rust client:

  • KvStore — low-level SDK (this README).
  • redhare.vllm_connector.RedhareConnector — vLLM v1 native KV connector. Full schema and mode-picking guide: docs/08-vllm-connector.md.

The low-level SDK below exposes batch gather/scatter methods for custom connector implementations:

from redhare import KvStore

store = KvStore(
    redis_url="redis://127.0.0.1:6379",
    client_id="rank0",
    data_plane_addr="127.0.0.1:7000",
    capacity="64MB",              # int bytes also accepted
    transport="nixl",            # "nixl" (default) | "tcp" | "inmemory"
    shared_fs_root=None,          # optional cold-tier path
    shared_fs_backend="posix",    # "posix" (default) | "gds"
    block_size_bytes=None,        # optional fixed per-key block size
    hot_cache_fraction=0.20,
    hot_cache_min_shared_fs_reads=2,
    cache_node_rpc_addr=None,     # optional: enable cache-node mode
    enable_remote_dram=False,     # originator: place into discovered cache-nodes
)

# Pre-register the KV cache region once; later reads DMA straight into it
# without re-registering per call.
store.register_buffer(kv_cache.data_ptr(), kv_cache.numel() * kv_cache.element_size())

# Save: each key's payload is gathered from N (addr, size) pairs (one memcpy).
rc = store.batch_put_from_multi_buffers(keys, addrs_per_key, sizes_per_key)
# Load: NIXL scatters the payload across the same N (addr, size) pairs.
# True zero-copy when destinations live inside a register_buffer region.
rc = store.batch_get_into_multi_buffers(keys, addrs_per_key, sizes_per_key)

# Experimental local-only load: enqueue the scatter copy on Redhare's CUDA
# copy stream and poll or wait for completion. This raises if any key is not
# local, so callers should fall back to batch_get_into_multi_buffers.
handle = store.batch_get_into_multi_buffers_submit_local(
    keys, addrs_per_key, sizes_per_key
)
# Either poll handle.is_done() from your scheduler loop, or block explicitly.
handle.wait()

# Existence check (1 = exists, 0 = missing, -1 = error per key)
rc = store.batch_is_exist(keys)

store.close()

Return convention uses per-key int lists where 0 is success and a negative value indicates failure.

Build

pip install maturin                          # once
cd crates/redhare-py
export CARGO_TARGET_DIR=/tmp/redhare-target   # NFS-safe (see project memory)
maturin develop --release                    # installs into the active env

A wheel is built with maturin build --release; install via pip install target/wheels/redhare-*.whl.

Notes

  • transport="nixl" is the only one with true zero-copy load. tcp and inmemory work but fall back to read-then-memcpy for scatters.
  • batch_get_into_multi_buffers_submit_local() is a local-only optimization for custom connectors. It does not read from remote peers or cold storage; use batch_get_into_multi_buffers() as the fallback path.
  • data_plane_addr is what peers connect to (NIXL control channel or TCP). Must be reachable from other clients in the cluster.
  • Remote DRAM uses data_plane_addr for payload movement and cache_node_rpc_addr only for control. With transport="nixl", remote writes use NIXL Write into the cache-node arena after an RPC reservation.
  • One connector key maps to one Redhare object: the per-key (addrs, sizes) lists are concatenated on save (one memcpy) and scattered on load (zero copy).
  • With block_size_bytes=..., each key is forced to one fixed-size block: shorter saves are zero-padded, larger saves fail.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

redhare-0.3.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

File details

Details for the file redhare-0.3.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for redhare-0.3.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 294159ff6fb3097d3dbe7039f4945eec51efa9ae8f9bd30a99872e8c59840145
MD5 ed895cb0270ae105c84b18a7e4facf76
BLAKE2b-256 a1ba28868f634cd5cb50cacf1b7fd8459fda5224e5a1be9c874160a1fe44e681

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page