KV Cache Management Module for sageLLM

These details have not been verified by PyPI

Project links

Project description

sagellm-kv-cache

KV Cache Management + KV Transfer for sageLLM inference engine.

Overview

This package provides efficient KV cache management and transfer for LLM inference.

Key Features:

KV Pool: Block-based memory management with budget control.
KV Transfer: Primitives for cross-node KV block migration.
Observability: Metrics and hooks for cache monitoring.

Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                    sagellm-control-plane                            │
│              (Scheduling: alloc/free/migrate decisions)             │
└────────────────────────────┬────────────────────────────────────────┘
                             │ KVCacheInterface
                             ▼
┌─────────────────────────────────────────────────────────────────────┐
│                      sagellm-kv-cache (This Package)                 │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐ │
│  │ PrefixCache │  │  KV Pool    │  │  Eviction   │  │ KV Transfer │ │
│  │  (Task2.1)  │  │  (Task2.2)  │  │  (Task2.3)  │  │  (Task1.3)  │ │
│  └─────────────┘  └─────────────┘  └─────────────┘  └──────┬──────┘ │
└────────────────────────────────────────────────────────────┼────────┘
                             ┌───────────────────────────────┘
                             │ Use CommBackend for transport
                             ▼
┌─────────────────────────────────────────────────────────────────────┐
│                         sagellm-comm                                 │
│              (Network Layer: Topology, Collectives)                 │
└─────────────────────────────────────────────────────────────────────┘

Installation

pip install isagellm-kv-cache

Quick Start (CPU-first)

KV Pool

from sagellm_kv_cache.pool import KVPool

# Create a KV pool with budget control
pool = KVPool(max_tokens=1024)

# Allocate KV cache block
handle = pool.alloc(num_tokens=128, device="cpu")
print(f"Allocated handle: {handle.handle_id}, Tokens: {handle.num_tokens}")

# Free the handle
pool.free(handle)

Prefix Cache (Task 2.1)

from sagellm_kv_cache import PrefixCache

# Create cache with block-based hashing
cache = PrefixCache(block_size=16, max_cached_blocks=100, enable_lru=True)

# Insert prefix blocks
tokens = list(range(48))  # 3 blocks
hashes = cache.compute_block_hashes(tokens)
blocks = [{"block_id": i} for i in range(len(hashes))]
cache.insert(hashes, blocks)

# Lookup with prefix overlap
hit_blocks, num_tokens = cache.lookup(hashes)
print(f"Reused {num_tokens} tokens from cache!")

# Check hit rate
stats = cache.get_stats()
print(f"Hit rate: {stats['hit_rate']:.1%}")

See examples/prefix_cache_example.py for comprehensive usage examples.

KV Cache Access Pattern Profiling

from sagellm_kv_cache.profiling import AccessStatsCollector

# Create statistics collector
collector = AccessStatsCollector()

# Record accesses during inference
collector.record_access("block_001", is_hit=True)
collector.record_access("block_002", is_hit=False)

# Export statistics to JSON
collector.export_stats("stats.json")

# Get summary
summary = collector.get_stats_summary()
print(f"Hit rate: {summary['hit_rate']:.2%}")
print(f"Total accesses: {summary['total_accesses']}")

CLI Tool - Generate Demo Data:

# Generate demo statistics
sage-kv-stats demo --num-accesses 1000 --output demo_stats.json

# Or use the Python script
python examples/kv_profiling_demo.py --num-accesses 500 --output stats.json

CLI Tool - Visualize Results:

# Generate heatmap
sage-kv-stats visualize --input stats.json --output heatmap.png

# Generate all visualizations with summary
sage-kv-stats visualize --input stats.json --type all --summary

# Or use the Python script
python scripts/visualize_access_pattern.py --input stats.json --type all --summary

Install visualization dependencies (matplotlib is optional):

pip install isagellm-kv-cache[visualization]

API Reference

Core Components

PrefixCache (sagellm_kv_cache): Block-hash based prefix caching for cross-request KV reuse. Supports LRU eviction, hit rate tracking, and handle invalidation. See Task 2.1.
KVPool (sagellm_kv_cache.pool): Main entry point for memory management. Handles allocation, freeing, and budget enforcement.
KVHandle (sagellm_kv_cache): Represents a reference to allocated KV cache. Contains metadata like handle_id, dtype, layout.
KVTransferEngine (sagellm_kv_cache): Handles moving KV blocks between nodes using sagellm-comm.
EvictionManager (sagellm_kv_cache): Eviction policy management with LRU/FIFO strategies.
SchedulerBridge (sagellm_kv_cache): Bridge between scheduler IR and KV pool operations.

Dependencies

isagellm-protocol: Common data structures and protocol definitions.
isagellm-backend: Backend abstraction.
isagellm-comm: Communication layer for transfer.

Development

推荐使用仓库脚本完成开发环境安装（不创建 venv，复用现有 Python 环境）：

# 开发模式（默认）：standard 基础 + 本地 editable 覆盖依赖（--no-deps）
./quickstart.sh --dev

# 标准模式：依赖优先从 PyPI 安装，本仓库 editable 安装
./quickstart.sh --standard

# 查看帮助
./quickstart.sh --help

脚本会在安装前动态清理已安装的 isagellm-* 包，避免历史安装导致环境漂移；安装失败时会打印详细日志，便于诊断。

Install dev dependencies:
```
pip install -e .[dev]
```
Run tests:
```
pytest
```
Linting:
```
ruff check .
```

Version

Current version: 0.4.0.11 See CHANGELOG.md for history.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.5.4.15

Mar 12, 2026

0.5.4.14

Mar 8, 2026

0.5.4.13

Mar 7, 2026

0.5.4.12

Mar 4, 2026

0.5.4.11

Mar 3, 2026

0.5.4.8

Mar 1, 2026

0.5.4.7

Mar 1, 2026

0.5.4.4

Mar 1, 2026

0.5.4.1

Feb 28, 2026

0.5.4.0

Feb 27, 2026

0.5.2.20

Feb 26, 2026

0.5.2.17

Feb 26, 2026

0.5.2.16

Feb 26, 2026

0.5.2.13

Feb 26, 2026

0.5.2.12

Feb 26, 2026

0.5.2.11

Feb 26, 2026

0.5.2.10

Feb 25, 2026

0.5.2.9

Feb 25, 2026

0.5.2.8

Feb 25, 2026

0.5.2.7

Feb 25, 2026

0.5.2.6

Feb 25, 2026

0.5.2.5

Feb 25, 2026

0.5.2.3

Feb 25, 2026

0.5.2.2

Feb 23, 2026

0.5.2.1

Feb 23, 2026

0.5.2.0

Feb 23, 2026

0.5.1.8

Feb 23, 2026

0.5.1.7

Feb 23, 2026

0.5.1.6

Feb 20, 2026

0.5.1.5

Feb 18, 2026

0.5.1.4

Feb 17, 2026

0.5.1.3

Feb 17, 2026

0.5.1.2

Feb 17, 2026

0.5.1.1

Feb 17, 2026

0.5.1.0

Feb 17, 2026

0.4.1.6

Feb 17, 2026

0.4.1.5

Feb 17, 2026

0.4.1.4

Feb 17, 2026

0.4.1.3

Feb 17, 2026

0.4.1.2

Feb 15, 2026

0.4.1.1

Feb 14, 2026

0.4.1.0

Feb 14, 2026

0.4.0.11

Feb 8, 2026

0.4.0.7

Jan 31, 2026

0.4.0.6

Jan 31, 2026

0.4.0.5

Jan 30, 2026

0.4.0.4

Jan 30, 2026

0.4.0.3

Jan 30, 2026

0.4.0.2

Jan 30, 2026

0.4.0.1

Jan 30, 2026

0.4.0.0

Jan 30, 2026

0.3.0.7

Jan 30, 2026

0.3.0.6

Jan 29, 2026

0.3.0.5

Jan 29, 2026

0.3.0.4

Jan 29, 2026

0.3.0.3

Jan 28, 2026

0.3.0.2

Jan 28, 2026

0.3.0.1

Jan 27, 2026

0.3.0.0

Jan 27, 2026

0.1.1.6

Jan 27, 2026

0.1.1.5

Jan 26, 2026

0.1.1.4

Jan 26, 2026

0.1.1.3

Jan 26, 2026

0.1.1.2

Jan 26, 2026

0.1.1.1

Jan 25, 2026

0.1.1.0

Jan 25, 2026

0.1.0.4

Jan 21, 2026

0.1.0.3

Jan 21, 2026

0.1.0.2

Jan 18, 2026

0.1.0.1

Jan 17, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

isagellm_kv_cache-0.5.4.15.tar.gz (204.9 kB view details)

Uploaded Mar 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

isagellm_kv_cache-0.5.4.15-py2.py3-none-any.whl (254.6 kB view details)

Uploaded Mar 12, 2026 Python 2Python 3

File details

Details for the file isagellm_kv_cache-0.5.4.15.tar.gz.

File metadata

Download URL: isagellm_kv_cache-0.5.4.15.tar.gz
Upload date: Mar 12, 2026
Size: 204.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for isagellm_kv_cache-0.5.4.15.tar.gz
Algorithm	Hash digest
SHA256	`b7737ed44b78ebc55bc945638955fb5ab5cf1833592d35ba2cfb5754864891ea`
MD5	`c3a72afcf811a7327c4a2cc079b2b346`
BLAKE2b-256	`6f3f1ab2138e395e1f60be9f1fd0405b296d1857176b00b6f9819aec9649e29e`

See more details on using hashes here.

File details

Details for the file isagellm_kv_cache-0.5.4.15-py2.py3-none-any.whl.

File metadata

Download URL: isagellm_kv_cache-0.5.4.15-py2.py3-none-any.whl
Upload date: Mar 12, 2026
Size: 254.6 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for isagellm_kv_cache-0.5.4.15-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`5146a9c28d5146c1f61c63bfbacec446bf00fb1a26df7b9b8f3a91003c14ac4b`
MD5	`7bf3d087abfb3e1801abd48c16326636`
BLAKE2b-256	`a0ae0fccdc829e0a201268c3776f350527ab589518584f3b3f50d8347e9d2519`

See more details on using hashes here.

isagellm-kv-cache 0.5.4.15

Navigation

Verified details

Owner

Unverified details

Project links

Meta

Classifiers

Project description

sagellm-kv-cache

Overview

Architecture

Installation

Quick Start (CPU-first)

KV Pool

Prefix Cache (Task 2.1)

KV Cache Access Pattern Profiling

API Reference

Core Components

Dependencies

Development

Version

Project details

Verified details

Owner

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes