Skip to main content

KV Cache Management Module for sageLLM

Project description

sagellm-kv-cache

KV Cache Management + KV Transfer for sageLLM inference engine.

CI codecov PyPI version Python 3.10+

Overview

This package provides efficient KV cache management and transfer for LLM inference.

Key Features:

  • KV Pool: Block-based memory management with budget control.
  • KV Transfer: Primitives for cross-node KV block migration.
  • Observability: Metrics and hooks for cache monitoring.

Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                    sagellm-control-plane                            │
│              (Scheduling: alloc/free/migrate decisions)             │
└────────────────────────────┬────────────────────────────────────────┘
                             │ KVCacheInterface
                             ▼
┌─────────────────────────────────────────────────────────────────────┐
│                      sagellm-kv-cache (This Package)                 │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐ │
│  │ PrefixCache │  │  KV Pool    │  │  Eviction   │  │ KV Transfer │ │
│  │  (Task2.1)  │  │  (Task2.2)  │  │  (Task2.3)  │  │  (Task1.3)  │ │
│  └─────────────┘  └─────────────┘  └─────────────┘  └──────┬──────┘ │
└────────────────────────────────────────────────────────────┼────────┘
                             ┌───────────────────────────────┘
                             │ Use CommBackend for transport
                             ▼
┌─────────────────────────────────────────────────────────────────────┐
│                         sagellm-comm                                 │
│              (Network Layer: Topology, Collectives)                 │
└─────────────────────────────────────────────────────────────────────┘

Installation

pip install isagellm-kv-cache

Quick Start (CPU-first)

KV Pool

from sagellm_kv_cache.pool import KVPool

# Create a KV pool with budget control
pool = KVPool(max_tokens=1024)

# Allocate KV cache block
handle = pool.alloc(num_tokens=128, device="cpu")
print(f"Allocated handle: {handle.handle_id}, Tokens: {handle.num_tokens}")

# Free the handle
pool.free(handle)

Prefix Cache (Task 2.1)

from sagellm_kv_cache import PrefixCache

# Create cache with block-based hashing
cache = PrefixCache(block_size=16, max_cached_blocks=100, enable_lru=True)

# Insert prefix blocks
tokens = list(range(48))  # 3 blocks
hashes = cache.compute_block_hashes(tokens)
blocks = [{"block_id": i} for i in range(len(hashes))]
cache.insert(hashes, blocks)

# Lookup with prefix overlap
hit_blocks, num_tokens = cache.lookup(hashes)
print(f"Reused {num_tokens} tokens from cache!")

# Check hit rate
stats = cache.get_stats()
print(f"Hit rate: {stats['hit_rate']:.1%}")

See examples/prefix_cache_example.py for comprehensive usage examples.

KV Cache Access Pattern Profiling

from sagellm_kv_cache.profiling import AccessStatsCollector

# Create statistics collector
collector = AccessStatsCollector()

# Record accesses during inference
collector.record_access("block_001", is_hit=True)
collector.record_access("block_002", is_hit=False)

# Export statistics to JSON
collector.export_stats("stats.json")

# Get summary
summary = collector.get_stats_summary()
print(f"Hit rate: {summary['hit_rate']:.2%}")
print(f"Total accesses: {summary['total_accesses']}")

CLI Tool - Generate Demo Data:

# Generate demo statistics
sage-kv-stats demo --num-accesses 1000 --output demo_stats.json

# Or use the Python script
python examples/kv_profiling_demo.py --num-accesses 500 --output stats.json

CLI Tool - Visualize Results:

# Generate heatmap
sage-kv-stats visualize --input stats.json --output heatmap.png

# Generate all visualizations with summary
sage-kv-stats visualize --input stats.json --type all --summary

# Or use the Python script
python scripts/visualize_access_pattern.py --input stats.json --type all --summary

Install visualization dependencies (matplotlib is optional):

pip install isagellm-kv-cache[visualization]

API Reference

Core Components

  • PrefixCache (sagellm_kv_cache): Block-hash based prefix caching for cross-request KV reuse. Supports LRU eviction, hit rate tracking, and handle invalidation. See Task 2.1.
  • KVPool (sagellm_kv_cache.pool): Main entry point for memory management. Handles allocation, freeing, and budget enforcement.
  • KVHandle (sagellm_kv_cache): Represents a reference to allocated KV cache. Contains metadata like handle_id, dtype, layout.
  • KVTransferEngine (sagellm_kv_cache): Handles moving KV blocks between nodes using sagellm-comm.
  • EvictionManager (sagellm_kv_cache): Eviction policy management with LRU/FIFO strategies.
  • SchedulerBridge (sagellm_kv_cache): Bridge between scheduler IR and KV pool operations.

Dependencies

  • isagellm-protocol: Common data structures and protocol definitions.
  • isagellm-backend: Backend abstraction.
  • isagellm-comm: Communication layer for transfer.

Development

推荐使用仓库脚本完成开发环境安装(不创建 venv,复用现有 Python 环境):

# 开发模式(默认):standard 基础 + 本地 editable 覆盖依赖(--no-deps)
./quickstart.sh --dev

# 标准模式:依赖优先从 PyPI 安装,本仓库 editable 安装
./quickstart.sh --standard

# 查看帮助
./quickstart.sh --help

脚本会在安装前动态清理已安装的 isagellm-* 包,避免历史安装导致环境漂移; 安装失败时会打印详细日志,便于诊断。

  1. Install dev dependencies:

    pip install -e .[dev]
    
  2. Run tests:

    pytest
    
  3. Linting:

    ruff check .
    

Version

Current version: 0.4.0.11 See CHANGELOG.md for history.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

isagellm_kv_cache-0.5.4.15.tar.gz (204.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

isagellm_kv_cache-0.5.4.15-py2.py3-none-any.whl (254.6 kB view details)

Uploaded Python 2Python 3

File details

Details for the file isagellm_kv_cache-0.5.4.15.tar.gz.

File metadata

  • Download URL: isagellm_kv_cache-0.5.4.15.tar.gz
  • Upload date:
  • Size: 204.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for isagellm_kv_cache-0.5.4.15.tar.gz
Algorithm Hash digest
SHA256 b7737ed44b78ebc55bc945638955fb5ab5cf1833592d35ba2cfb5754864891ea
MD5 c3a72afcf811a7327c4a2cc079b2b346
BLAKE2b-256 6f3f1ab2138e395e1f60be9f1fd0405b296d1857176b00b6f9819aec9649e29e

See more details on using hashes here.

File details

Details for the file isagellm_kv_cache-0.5.4.15-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for isagellm_kv_cache-0.5.4.15-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 5146a9c28d5146c1f61c63bfbacec446bf00fb1a26df7b9b8f3a91003c14ac4b
MD5 7bf3d087abfb3e1801abd48c16326636
BLAKE2b-256 a0ae0fccdc829e0a201268c3776f350527ab589518584f3b3f50d8347e9d2519

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page