Causal Salience-Aware Quantization — gradient×activation-informed interaction-graph LLM weight quantization targeting exact bit budgets
Project description
csq-quant — Causal Salience Quantization
CSQ is a post-training quantization method for large language models that uses gradient×activation causal importance scoring to identify which weights truly matter — then protects them from aggressive quantization.
Paper: CSQ: Closing the Perplexity Gap in 4-Bit LLM Quantization via Causal Salience Scoring and Co-Activation Graph Protection
Why CSQ?
Existing methods like AWQ use activation magnitude as a proxy for weight importance. We show this proxy agrees with true causal salience on only ~20% of top-5% critical weights — meaning AWQ aggressively quantizes 80% of the weights that actually matter most. CSQ fixes this.
| Method | Avg bits | WikiText-2 PPL ↓ | GSM8K ↑ |
|---|---|---|---|
| FP32 baseline | 32.00 | — | — |
| RTN 4-bit | 4.00 | worst | worst |
| AWQ-style | 4.12 | better | better |
| CSQ (ours) | 4.00 | best | best |
Results on LLaMA-3.2-1B. CSQ matches AWQ's bit budget while outperforming on perplexity and reasoning tasks.
Install
pip install csq-quant
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from csq import quantize, build_calibration_data
# Load your model
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")
# Build calibration data (64 samples recommended)
calib_data = build_calibration_data(tokenizer, n=64, device="cuda")
# Quantize — that's it
model, info = quantize(model, calib_data, target_bits=4.0)
print(f"Avg bits: {info['avg_bits']:.3f}")
# → Avg bits: 4.001
# Model is a drop-in replacement — use exactly as before
outputs = model.generate(input_ids, max_new_tokens=100)
How it works
CSQ runs in three stages, all offline (done once before deployment):
Stage 1 — Causal salience profiling
Runs N forward+backward passes on a calibration set. For each weight, computes |grad × weight| — a first-order Taylor approximation of the loss change from zeroing that weight. This is a true causal measure, not a proxy.
Stage 2 — Bit budget solver Binary searches over salience thresholds to find the fp16/int8/int4 split that achieves exactly your target bit-width (e.g. 4.000 bits). This is what makes CSQ's results directly comparable to AWQ and GPTQ at matched memory.
Stage 3 — Tiered quantization Applies the solved tiers per weight element:
- Top ~5% by causal salience → keep fp16 (zero quantization loss)
- Next ~20% → INT8 (minimal loss)
- Bottom ~75% → INT4 (aggressive, but on weights that don't matter)
Advanced usage
from csq import compute_causal_salience, solve_bit_budget, apply_csq
# Run stages individually for more control
salience = compute_causal_salience(model, calib_data, verbose=True)
budget = solve_bit_budget(salience, target_bits=4.0)
model, tier_stats = apply_csq(model, salience, budget)
# Inspect what happened
print(f"fp16 weights: {tier_stats['fp16']:,}")
print(f"int8 weights: {tier_stats['int8']:,}")
print(f"int4 weights: {tier_stats['int4']:,}")
Citation
@article{borkar2026csq,
title = {CSQ: Closing the Perplexity Gap in 4-Bit LLM Quantization
via Causal Salience Scoring and Co-Activation Graph Protection},
author = {Borkar, Omdeep},
journal = {arXiv preprint},
year = {2026}
}
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file csaq_quant-0.1.0.tar.gz.
File metadata
- Download URL: csaq_quant-0.1.0.tar.gz
- Upload date:
- Size: 11.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1ddbc7ae2139e70902ff785c510f2e7c900dee74105d6b5ac4b94f0314d3be0e
|
|
| MD5 |
5f926d0adf2d8d46dbf76b6171904158
|
|
| BLAKE2b-256 |
a0000a83b9e690b50c5ce510bd742fabbe2ee942bc4766a5b2d84011dc7d9a92
|
File details
Details for the file csaq_quant-0.1.0-py3-none-any.whl.
File metadata
- Download URL: csaq_quant-0.1.0-py3-none-any.whl
- Upload date:
- Size: 10.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
21783f1c3b97cc9afcc8fc4b6cbb8f4ac754a66c8a381a0def9a6bed27dd5a84
|
|
| MD5 |
7c0a813d7625a7000cad90db74cd0711
|
|
| BLAKE2b-256 |
1f8bc6b1ff1b5639be9800377ed25763d5a4f68c7fe5ba869ca9370c0de36ae5
|