Skip to main content

Stable, high-performance KVCache for LLM inference, with decentralized coordination, memory-pool storage, and multi-tier caching across hosts.

Project description

redhare (Python connector)

In-process Python binding around RedhareClient. Ships two API layers that share the same embedded Rust client:

  • KvStore — low-level SDK (this README).
  • redhare.vllm_connector.RedhareConnector — vLLM v1 native KV connector. Full schema and mode-picking guide: docs/08-vllm-connector.md.

The low-level SDK below exposes batch gather/scatter methods for custom connector implementations:

from redhare import KvStore

store = KvStore(
    redis_url="redis://127.0.0.1:6379",
    client_id="rank0",
    data_plane_addr="127.0.0.1:7000",
    capacity="64MB",              # int bytes also accepted
    transport="nixl",            # "nixl" (default) | "tcp" | "inmemory"
    shared_fs_root=None,          # optional cold-tier path
    shared_fs_backend="posix",    # "posix" (default) | "gds"
    block_size_bytes=None,        # optional fixed per-key block size
    hot_cache_fraction=0.20,
    hot_cache_min_shared_fs_reads=2,
    cache_node_rpc_addr=None,     # optional: enable cache-node mode
    enable_remote_dram=False,     # originator: place into discovered cache-nodes
)

# Pre-register the KV cache region once; later reads DMA straight into it
# without re-registering per call.
store.register_buffer(kv_cache.data_ptr(), kv_cache.numel() * kv_cache.element_size())

# Save: each key's payload is gathered from N (addr, size) pairs (one memcpy).
rc = store.batch_put_from_multi_buffers(keys, addrs_per_key, sizes_per_key)
# Load: NIXL scatters the payload across the same N (addr, size) pairs.
# True zero-copy when destinations live inside a register_buffer region.
rc = store.batch_get_into_multi_buffers(keys, addrs_per_key, sizes_per_key)

# Experimental local-only load: enqueue the scatter copy on Redhare's CUDA
# copy stream and poll or wait for completion. This raises if any key is not
# local, so callers should fall back to batch_get_into_multi_buffers.
handle = store.batch_get_into_multi_buffers_submit_local(
    keys, addrs_per_key, sizes_per_key
)
# Either poll handle.is_done() from your scheduler loop, or block explicitly.
handle.wait()

# Existence check (1 = exists, 0 = missing, -1 = error per key)
rc = store.batch_is_exist(keys)

store.close()

Return convention uses per-key int lists where 0 is success and a negative value indicates failure.

Build

pip install maturin                          # once
cd crates/redhare-py
export CARGO_TARGET_DIR=/tmp/redhare-target   # NFS-safe (see project memory)
maturin develop --release                    # installs into the active env

A wheel is built with maturin build --release; install via pip install target/wheels/redhare-*.whl.

Notes

  • transport="nixl" is the only one with true zero-copy load. tcp and inmemory work but fall back to read-then-memcpy for scatters.
  • batch_get_into_multi_buffers_submit_local() is a local-only optimization for custom connectors. It does not read from remote peers or cold storage; use batch_get_into_multi_buffers() as the fallback path.
  • data_plane_addr is what peers connect to (NIXL control channel or TCP). Must be reachable from other clients in the cluster.
  • Remote DRAM uses data_plane_addr for payload movement and cache_node_rpc_addr only for control. With transport="nixl", remote writes use NIXL Write into the cache-node arena after an RPC reservation.
  • One connector key maps to one Redhare object: the per-key (addrs, sizes) lists are concatenated on save (one memcpy) and scattered on load (zero copy).
  • With block_size_bytes=..., each key is forced to one fixed-size block: shorter saves are zero-padded, larger saves fail.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

redhare-0.4.1rc0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.9 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

File details

Details for the file redhare-0.4.1rc0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for redhare-0.4.1rc0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3097787c387f188d07966731fb02953064afb3e9d40550f8d057d32ea7281585
MD5 313d413f98ca854634e07b624332883f
BLAKE2b-256 4afe7024132691c75b2fecdd4f77ea1b9c5baa15c017c1ba696977af5020d7e8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page