Skip to main content

Causal Salience-Aware Quantization — gradient×activation-informed interaction-graph LLM weight quantization targeting exact bit budgets

Project description

csq-quant — Causal Salience Quantization

PyPI License: MIT

CSQ is a post-training quantization method for large language models that uses gradient×activation causal importance scoring to identify which weights truly matter — then protects them from aggressive quantization.

Paper: CSQ: Closing the Perplexity Gap in 4-Bit LLM Quantization via Causal Salience Scoring and Co-Activation Graph Protection

Why CSQ?

Existing methods like AWQ use activation magnitude as a proxy for weight importance. We show this proxy agrees with true causal salience on only ~20% of top-5% critical weights — meaning AWQ aggressively quantizes 80% of the weights that actually matter most. CSQ fixes this.

Method Avg bits WikiText-2 PPL ↓ GSM8K ↑
FP32 baseline 32.00
RTN 4-bit 4.00 worst worst
AWQ-style 4.12 better better
CSQ (ours) 4.00 best best

Results on LLaMA-3.2-1B. CSQ matches AWQ's bit budget while outperforming on perplexity and reasoning tasks.

Install

pip install csq-quant

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from csq import quantize, build_calibration_data

# Load your model
model     = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")

# Build calibration data (64 samples recommended)
calib_data = build_calibration_data(tokenizer, n=64, device="cuda")

# Quantize — that's it
model, info = quantize(model, calib_data, target_bits=4.0)

print(f"Avg bits: {info['avg_bits']:.3f}")
# → Avg bits: 4.001

# Model is a drop-in replacement — use exactly as before
outputs = model.generate(input_ids, max_new_tokens=100)

How it works

CSQ runs in three stages, all offline (done once before deployment):

Stage 1 — Causal salience profiling Runs N forward+backward passes on a calibration set. For each weight, computes |grad × weight| — a first-order Taylor approximation of the loss change from zeroing that weight. This is a true causal measure, not a proxy.

Stage 2 — Bit budget solver Binary searches over salience thresholds to find the fp16/int8/int4 split that achieves exactly your target bit-width (e.g. 4.000 bits). This is what makes CSQ's results directly comparable to AWQ and GPTQ at matched memory.

Stage 3 — Tiered quantization Applies the solved tiers per weight element:

  • Top ~5% by causal salience → keep fp16 (zero quantization loss)
  • Next ~20% → INT8 (minimal loss)
  • Bottom ~75% → INT4 (aggressive, but on weights that don't matter)

Advanced usage

from csq import compute_causal_salience, solve_bit_budget, apply_csq

# Run stages individually for more control
salience = compute_causal_salience(model, calib_data, verbose=True)
budget   = solve_bit_budget(salience, target_bits=4.0)
model, tier_stats = apply_csq(model, salience, budget)

# Inspect what happened
print(f"fp16 weights: {tier_stats['fp16']:,}")
print(f"int8 weights: {tier_stats['int8']:,}")
print(f"int4 weights: {tier_stats['int4']:,}")

Citation

@article{borkar2026csq,
  title   = {CSQ: Closing the Perplexity Gap in 4-Bit LLM Quantization
             via Causal Salience Scoring and Co-Activation Graph Protection},
  author  = {Borkar, Omdeep},
  journal = {arXiv preprint},
  year    = {2026}
}

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

csaq_quant-0.1.0.tar.gz (11.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

csaq_quant-0.1.0-py3-none-any.whl (10.8 kB view details)

Uploaded Python 3

File details

Details for the file csaq_quant-0.1.0.tar.gz.

File metadata

  • Download URL: csaq_quant-0.1.0.tar.gz
  • Upload date:
  • Size: 11.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.4

File hashes

Hashes for csaq_quant-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1ddbc7ae2139e70902ff785c510f2e7c900dee74105d6b5ac4b94f0314d3be0e
MD5 5f926d0adf2d8d46dbf76b6171904158
BLAKE2b-256 a0000a83b9e690b50c5ce510bd742fabbe2ee942bc4766a5b2d84011dc7d9a92

See more details on using hashes here.

File details

Details for the file csaq_quant-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: csaq_quant-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 10.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.4

File hashes

Hashes for csaq_quant-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 21783f1c3b97cc9afcc8fc4b6cbb8f4ac754a66c8a381a0def9a6bed27dd5a84
MD5 7c0a813d7625a7000cad90db74cd0711
BLAKE2b-256 1f8bc6b1ff1b5639be9800377ed25763d5a4f68c7fe5ba869ca9370c0de36ae5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page