Model Compression & Acceleration Module for sageLLM
Project description
sagellm-compression
Model Compression & Acceleration Module for SageLLM (Task 3).
📌 Architecture & Responsibility
sagellm-compression is the core module responsible for model compression (quantization, sparsity) and inference acceleration strategies (speculative decoding, CoT optimizations).
Dependency Graph
graph TD
Protocol[isagellm-protocol] --> Backend[isagellm-backend]
Protocol --> Compression[isagellm-compression]
Backend --> Compression
Compression --> Core[isagellm-core]
Compression --> KVCache[isagellm-kv-cache]
- Depends on:
isagellm-protocol: Shared schemas and definitions.isagellm-backend: Hardware acceleration kernels (for Quantization).
- Used by:
isagellm-core: Main inference engine (integrates acceleration strategies).isagellm-kv-cache: Uses compression techniques for KV storage.
✨ Features
- Chain-of-Thought (CoT) Acceleration (Task 3.5, Implemented): CoT detection + prompt compression + chain shortening with metrics.
- Quantization (Task 3.1, Implemented): INT8/INT4 weight and activation quantization.
- Sparsity (Task 3.2, Implemented): Structured (N:M / block) and unstructured pruning support.
- Speculative Decoding (Task 3.3, Implemented): Draft-verify orchestration with acceptance-rate metrics.
- Kernel Fusion (Task 3.4, Implemented): Attention/MLP fusion interfaces with CPU stub for CI.
📦 Installation
pip install isagellm-compression
Requirements
- Python >= 3.10
isagellm-protocol >= 0.4.0.0isagellm-backend >= 0.4.0.0
🚀 Quick Start
Chain-of-Thought (CoT) Templates
Currently, the CoT module provides template management for reasoning tasks.
from sagellm_compression.cot import CoTTemplateManager
# Initialize the manager
# Note: Ensure you have the templates directory available
manager = CoTTemplateManager(template_dir="templates/cot")
# Load a specific strategy template (e.g., zero-shot reasoning)
try:
template_content = manager.load_template("zero_shot")
# Render the prompt with a question
prompt = manager.render(template_content, question="What is the result of 25 * 14?")
print("--- Generated Prompt ---")
print(prompt)
except FileNotFoundError:
print("Template not found. Please ensure 'templates/cot' exists.")
🛠️ Development
Setup
git clone git@github.com:intellistream/sagellm-compression.git
cd sagellm-compression
./quickstart.sh --standard
# 开发联调(在 standard 基础上,本地 editable 覆盖)
./quickstart.sh --dev
# 查看帮助
./quickstart.sh --help
# quickstart 会在安装前自动清理同前缀历史包(isagellm-*)
# 并且不会创建 venv/.venv,复用当前非-venv Python 环境
模式说明:
--standard:优先安装 PyPI 稳定包(默认模式)--dev:先执行 standard,再执行本地 editable 覆盖(--no-deps)
Testing & Linting
# Run all tests
pytest -v
# Run issue #8 unit-test subset
pytest -v tests/unit/test_quantization.py tests/unit/test_sparsity.py tests/unit/test_speculative_config.py
# Check code style
ruff check .
ruff format .
Core Principles
- Protocol-First: Changes involving schemas must update
isagellm-protocolfirst. - CPU-First: All compression logic must reference CPU implementation by default.
- Fail-Fast: Missing configurations must raise explicit errors.
📚 Documentation
🔄 贡献指南
请遵循以下工作流程:
-
创建 Issue - 描述问题/需求
gh issue create --title "[Bug] 描述" --label "bug,sagellm-compression"
-
开发修复 - 在本地
fix/#123-xxx分支解决git checkout -b fix/#123-xxx origin/main-dev # 开发、测试... pytest -v ruff format . && ruff check . --fix
-
发起 PR - 提交到
main-dev分支gh pr create --base main-dev --title "Fix: 描述" --body "Closes #123"
-
合并 - 审批后合并到
main-dev
更多详情见 .github/copilot-instructions.md
📅 Versioning & Changelog
Current Version: 0.4.0.10
See CHANGELOG.md for full history.
License
Private - IntelliStream Research Project
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file isagellm_compression-0.5.4.9.tar.gz.
File metadata
- Download URL: isagellm_compression-0.5.4.9.tar.gz
- Upload date:
- Size: 110.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
da0934306896415339b765f5fc38ff6ac37f485b933853a864d0e3cf8211403b
|
|
| MD5 |
001a8543de0a35236bde602c28ea9505
|
|
| BLAKE2b-256 |
90b2dd72f7700594b6f5fd0c8d847d94f0c42e5661c017d3ac753113ba5a66f4
|
File details
Details for the file isagellm_compression-0.5.4.9-py2.py3-none-any.whl.
File metadata
- Download URL: isagellm_compression-0.5.4.9-py2.py3-none-any.whl
- Upload date:
- Size: 168.8 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c3dc79b3429301a8056730db5e419d96b6de39a0c970e514a12ffa200cc7cc6b
|
|
| MD5 |
dd182304a148eaee131bdd41195ff8d0
|
|
| BLAKE2b-256 |
3e01a8d1bd23cfadfe8b686c2230ea0b5c67dcfffe2d654bd9b7fbcd9e59a325
|