KV Cache Management Module for sageLLM
Project description
sagellm-kv-cache
KV Cache Management + KV Transfer for sageLLM inference engine.
Overview
This package provides efficient KV cache management and transfer for LLM inference.
Key Features:
- KV Pool: Block-based memory management with budget control.
- KV Transfer: Primitives for cross-node KV block migration.
- Observability: Metrics and hooks for cache monitoring.
Architecture
┌─────────────────────────────────────────────────────────────────────┐
│ sagellm-control-plane │
│ (Scheduling: alloc/free/migrate decisions) │
└────────────────────────────┬────────────────────────────────────────┘
│ KVCacheInterface
▼
┌─────────────────────────────────────────────────────────────────────┐
│ sagellm-kv-cache (This Package) │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ PrefixCache │ │ KV Pool │ │ Eviction │ │ KV Transfer │ │
│ │ (Task2.1) │ │ (Task2.2) │ │ (Task2.3) │ │ (Task1.3) │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ └──────┬──────┘ │
└────────────────────────────────────────────────────────────┼────────┘
┌───────────────────────────────┘
│ Use CommBackend for transport
▼
┌─────────────────────────────────────────────────────────────────────┐
│ sagellm-comm │
│ (Network Layer: Topology, Collectives) │
└─────────────────────────────────────────────────────────────────────┘
Installation
pip install isagellm-kv-cache
Quick Start (CPU-first)
KV Pool
from sagellm_kv_cache.pool import KVPool
# Create a KV pool with budget control
pool = KVPool(max_tokens=1024)
# Allocate KV cache block
handle = pool.alloc(num_tokens=128, device="cpu")
print(f"Allocated handle: {handle.handle_id}, Tokens: {handle.num_tokens}")
# Free the handle
pool.free(handle)
Prefix Cache (Task 2.1)
from sagellm_kv_cache import PrefixCache
# Create cache with block-based hashing
cache = PrefixCache(block_size=16, max_cached_blocks=100, enable_lru=True)
# Insert prefix blocks
tokens = list(range(48)) # 3 blocks
hashes = cache.compute_block_hashes(tokens)
blocks = [{"block_id": i} for i in range(len(hashes))]
cache.insert(hashes, blocks)
# Lookup with prefix overlap
hit_blocks, num_tokens = cache.lookup(hashes)
print(f"Reused {num_tokens} tokens from cache!")
# Check hit rate
stats = cache.get_stats()
print(f"Hit rate: {stats['hit_rate']:.1%}")
See examples/prefix_cache_example.py for comprehensive usage examples.
KV Cache Access Pattern Profiling
from sagellm_kv_cache.profiling import AccessStatsCollector
# Create statistics collector
collector = AccessStatsCollector()
# Record accesses during inference
collector.record_access("block_001", is_hit=True)
collector.record_access("block_002", is_hit=False)
# Export statistics to JSON
collector.export_stats("stats.json")
# Get summary
summary = collector.get_stats_summary()
print(f"Hit rate: {summary['hit_rate']:.2%}")
print(f"Total accesses: {summary['total_accesses']}")
CLI Tool - Generate Demo Data:
# Generate demo statistics
sage-kv-stats demo --num-accesses 1000 --output demo_stats.json
# Or use the Python script
python examples/kv_profiling_demo.py --num-accesses 500 --output stats.json
CLI Tool - Visualize Results:
# Generate heatmap
sage-kv-stats visualize --input stats.json --output heatmap.png
# Generate all visualizations with summary
sage-kv-stats visualize --input stats.json --type all --summary
# Or use the Python script
python scripts/visualize_access_pattern.py --input stats.json --type all --summary
Install visualization dependencies (matplotlib is optional):
pip install isagellm-kv-cache[visualization]
API Reference
Core Components
PrefixCache(sagellm_kv_cache): Block-hash based prefix caching for cross-request KV reuse. Supports LRU eviction, hit rate tracking, and handle invalidation. See Task 2.1.KVPool(sagellm_kv_cache.pool): Main entry point for memory management. Handles allocation, freeing, and budget enforcement.KVHandle(sagellm_kv_cache): Represents a reference to allocated KV cache. Contains metadata likehandle_id,dtype,layout.KVTransferEngine(sagellm_kv_cache): Handles moving KV blocks between nodes usingsagellm-comm.EvictionManager(sagellm_kv_cache): Eviction policy management with LRU/FIFO strategies.SchedulerBridge(sagellm_kv_cache): Bridge between scheduler IR and KV pool operations.
Dependencies
isagellm-protocol: Common data structures and protocol definitions.isagellm-backend: Backend abstraction.isagellm-comm: Communication layer for transfer.
Development
推荐使用仓库脚本完成开发环境安装(不创建 venv,复用现有 Python 环境):
# 开发模式(默认):standard 基础 + 本地 editable 覆盖依赖(--no-deps)
./quickstart.sh --dev
# 标准模式:依赖优先从 PyPI 安装,本仓库 editable 安装
./quickstart.sh --standard
# 查看帮助
./quickstart.sh --help
脚本会在安装前动态清理已安装的 isagellm-* 包,避免历史安装导致环境漂移;
安装失败时会打印详细日志,便于诊断。
-
Install dev dependencies:
pip install -e .[dev]
-
Run tests:
pytest
-
Linting:
ruff check .
Version
Current version: 0.4.0.11 See CHANGELOG.md for history.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file isagellm_kv_cache-0.5.4.15.tar.gz.
File metadata
- Download URL: isagellm_kv_cache-0.5.4.15.tar.gz
- Upload date:
- Size: 204.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b7737ed44b78ebc55bc945638955fb5ab5cf1833592d35ba2cfb5754864891ea
|
|
| MD5 |
c3a72afcf811a7327c4a2cc079b2b346
|
|
| BLAKE2b-256 |
6f3f1ab2138e395e1f60be9f1fd0405b296d1857176b00b6f9819aec9649e29e
|
File details
Details for the file isagellm_kv_cache-0.5.4.15-py2.py3-none-any.whl.
File metadata
- Download URL: isagellm_kv_cache-0.5.4.15-py2.py3-none-any.whl
- Upload date:
- Size: 254.6 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5146a9c28d5146c1f61c63bfbacec446bf00fb1a26df7b9b8f3a91003c14ac4b
|
|
| MD5 |
7bf3d087abfb3e1801abd48c16326636
|
|
| BLAKE2b-256 |
a0ae0fccdc829e0a201268c3776f350527ab589518584f3b3f50d8347e9d2519
|