Skip to main content

Model Compression & Acceleration Module for sageLLM

Project description

sagellm-compression

Model Compression & Acceleration Module for SageLLM (Task 3).

CI PyPI version Python 3.10+ codecov

📌 Architecture & Responsibility

sagellm-compression is the core module responsible for model compression (quantization, sparsity) and inference acceleration strategies (speculative decoding, CoT optimizations).

Dependency Graph

graph TD
    Protocol[isagellm-protocol] --> Backend[isagellm-backend]
    Protocol --> Compression[isagellm-compression]
    Backend --> Compression
    Compression --> Core[isagellm-core]
    Compression --> KVCache[isagellm-kv-cache]
  • Depends on:
    • isagellm-protocol: Shared schemas and definitions.
    • isagellm-backend: Hardware acceleration kernels (for Quantization).
  • Used by:
    • isagellm-core: Main inference engine (integrates acceleration strategies).
    • isagellm-kv-cache: Uses compression techniques for KV storage.

✨ Features

  • Chain-of-Thought (CoT) Acceleration (Task 3.5, Implemented): CoT detection + prompt compression + chain shortening with metrics.
  • Quantization (Task 3.1, Implemented): INT8/INT4 weight and activation quantization.
  • Sparsity (Task 3.2, Implemented): Structured (N:M / block) and unstructured pruning support.
  • Speculative Decoding (Task 3.3, Implemented): Draft-verify orchestration with acceptance-rate metrics.
  • Kernel Fusion (Task 3.4, Implemented): Attention/MLP fusion interfaces with CPU stub for CI.

📦 Installation

pip install isagellm-compression

Requirements

  • Python >= 3.10
  • isagellm-protocol >= 0.4.0.0
  • isagellm-backend >= 0.4.0.0

🚀 Quick Start

Chain-of-Thought (CoT) Templates

Currently, the CoT module provides template management for reasoning tasks.

from sagellm_compression.cot import CoTTemplateManager

# Initialize the manager
# Note: Ensure you have the templates directory available
manager = CoTTemplateManager(template_dir="templates/cot")

# Load a specific strategy template (e.g., zero-shot reasoning)
try:
    template_content = manager.load_template("zero_shot")
    # Render the prompt with a question
    prompt = manager.render(template_content, question="What is the result of 25 * 14?")
    print("--- Generated Prompt ---")
    print(prompt)
except FileNotFoundError:
    print("Template not found. Please ensure 'templates/cot' exists.")

🛠️ Development

Setup

git clone git@github.com:intellistream/sagellm-compression.git
cd sagellm-compression
./quickstart.sh --standard

# 开发联调(在 standard 基础上,本地 editable 覆盖)
./quickstart.sh --dev

# 查看帮助
./quickstart.sh --help

# quickstart 会在安装前自动清理同前缀历史包(isagellm-*)
# 并且不会创建 venv/.venv,复用当前非-venv Python 环境

模式说明:

  • --standard:优先安装 PyPI 稳定包(默认模式)
  • --dev:先执行 standard,再执行本地 editable 覆盖(--no-deps

Testing & Linting

# Run all tests
pytest -v

# Run issue #8 unit-test subset
pytest -v tests/unit/test_quantization.py tests/unit/test_sparsity.py tests/unit/test_speculative_config.py

# Check code style
ruff check .
ruff format .

Core Principles

  • Protocol-First: Changes involving schemas must update isagellm-protocol first.
  • CPU-First: All compression logic must reference CPU implementation by default.
  • Fail-Fast: Missing configurations must raise explicit errors.

📚 Documentation

🔄 贡献指南

请遵循以下工作流程:

  1. 创建 Issue - 描述问题/需求

    gh issue create --title "[Bug] 描述" --label "bug,sagellm-compression"
    
  2. 开发修复 - 在本地 fix/#123-xxx 分支解决

    git checkout -b fix/#123-xxx origin/main-dev
    # 开发、测试...
    pytest -v
    ruff format . && ruff check . --fix
    
  3. 发起 PR - 提交到 main-dev 分支

    gh pr create --base main-dev --title "Fix: 描述" --body "Closes #123"
    
  4. 合并 - 审批后合并到 main-dev

更多详情见 .github/copilot-instructions.md

📅 Versioning & Changelog

Current Version: 0.4.0.10

See CHANGELOG.md for full history.

License

Private - IntelliStream Research Project

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

isagellm_compression-0.5.4.12.tar.gz (69.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

isagellm_compression-0.5.4.12-py2.py3-none-any.whl (90.6 kB view details)

Uploaded Python 2Python 3

File details

Details for the file isagellm_compression-0.5.4.12.tar.gz.

File metadata

  • Download URL: isagellm_compression-0.5.4.12.tar.gz
  • Upload date:
  • Size: 69.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for isagellm_compression-0.5.4.12.tar.gz
Algorithm Hash digest
SHA256 3088a2770261d53f03b2de1d573adfcde359ed58c1179731c5c6df506b8d1202
MD5 49c8d2a892d9dbf60454d58cb2e9f1f9
BLAKE2b-256 02cf8cd2e41d18abb318fe7c0766f6cbe50b8a7fdb2767b891acf2bc8f2e0a41

See more details on using hashes here.

File details

Details for the file isagellm_compression-0.5.4.12-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for isagellm_compression-0.5.4.12-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 c7be6c089b600fcb40043a3873dc5064d34d19e682a5ecc70f1aca1041ccb38f
MD5 df38009dcd80e40b59f076fd80bd024e
BLAKE2b-256 88de431d2c8564daeeeb0772aa1532929a32b0c570d097fbf1caf352a898cdd9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page