Model Compression & Acceleration Module for sageLLM
Project description
sagellm-compression
Model Compression & Acceleration Module for SageLLM (Task 3).
📌 Architecture & Responsibility
sagellm-compression is the core module responsible for model compression (quantization, sparsity) and inference acceleration strategies (speculative decoding, CoT optimizations).
Dependency Graph
graph TD
Protocol[isagellm-protocol] --> Backend[isagellm-backend]
Protocol --> Compression[isagellm-compression]
Backend --> Compression
Compression --> Core[isagellm-core]
Compression --> KVCache[isagellm-kv-cache]
- Depends on:
isagellm-protocol: Shared schemas and definitions.isagellm-backend: Hardware acceleration kernels (for Quantization).
- Used by:
isagellm-core: Main inference engine (integrates acceleration strategies).isagellm-kv-cache: Uses compression techniques for KV storage.
✨ Features
- Chain-of-Thought (CoT): Template management and prompt engineering strategies (Zero-shot, Few-shot, Self-Consistency).
- Quantization (Planned Task 3.1): INT8/INT4 weight and activation quantization.
- Sparsity (Planned Task 3.2): Structured and unstructured pruning support.
- Speculative Decoding (Planned Task 3.3): Draft model orchestration.
- Kernel Fusion (Planned Task 3.4): Operator fusion optimization.
📦 Installation
pip install isagellm-compression
Requirements
- Python >= 3.10
isagellm-protocol >= 0.4.0.0isagellm-backend >= 0.4.0.0
🚀 Quick Start
Chain-of-Thought (CoT) Templates
Currently, the CoT module provides template management for reasoning tasks.
from sagellm_compression.cot import CoTTemplateManager
# Initialize the manager
# Note: Ensure you have the templates directory available
manager = CoTTemplateManager(template_dir="templates/cot")
# Load a specific strategy template (e.g., zero-shot reasoning)
try:
template_content = manager.load_template("zero_shot")
# Render the prompt with a question
prompt = manager.render(template_content, question="What is the result of 25 * 14?")
print("--- Generated Prompt ---")
print(prompt)
except FileNotFoundError:
print("Template not found. Please ensure 'templates/cot' exists.")
🛠️ Development
Setup
git clone git@github.com:intellistream/sagellm-compression.git
cd sagellm-compression
./quickstart.sh
# Install in editable mode with dev dependencies
pip install -e ".[dev]"
Testing & Linting
# Run all tests
pytest -v
# Check code style
ruff check .
ruff format .
Core Principles
- Protocol-First: Changes involving schemas must update
isagellm-protocolfirst. - CPU-First: All compression logic must reference CPU implementation by default.
- Fail-Fast: Missing configurations must raise explicit errors.
📚 Documentation
🔄 贡献指南
请遵循以下工作流程:
-
创建 Issue - 描述问题/需求
gh issue create --title "[Bug] 描述" --label "bug,sagellm-compression"
-
开发修复 - 在本地
fix/#123-xxx分支解决git checkout -b fix/#123-xxx origin/main-dev # 开发、测试... pytest -v ruff format . && ruff check . --fix
-
发起 PR - 提交到
main-dev分支gh pr create --base main-dev --title "Fix: 描述" --body "Closes #123"
-
合并 - 审批后合并到
main-dev
更多详情见 .github/copilot-instructions.md
📅 Versioning & Changelog
Current Version: 0.4.0.10
See CHANGELOG.md for full history.
License
Private - IntelliStream Research Project
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file isagellm_compression-0.4.0.11.tar.gz.
File metadata
- Download URL: isagellm_compression-0.4.0.11.tar.gz
- Upload date:
- Size: 40.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
57c64e36ca9aca427272b463f566db762ee1074de23d84d213f0013da0c5900b
|
|
| MD5 |
5df7376270204cbc1f7b1b9e76c8eec8
|
|
| BLAKE2b-256 |
4499409f14dd15f567f84b8fc40c925fe86b16efa841c0b6d93b40c03736d078
|
File details
Details for the file isagellm_compression-0.4.0.11-py2.py3-none-any.whl.
File metadata
- Download URL: isagellm_compression-0.4.0.11-py2.py3-none-any.whl
- Upload date:
- Size: 56.7 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6ad3888773745c2d483556f96c470436ebd656f3308fdf3803e1179f83902754
|
|
| MD5 |
2551b4f1ca5b776caa0240be7072d73a
|
|
| BLAKE2b-256 |
55d199b3aa014934c426256523099dfd5ae7bb9a16155e467e57c5ad8419c13d
|