Causal Salience-Aware Quantization — gradient×activation-informed interaction-graph LLM weight quantization targeting exact bit budgets

These details have not been verified by PyPI

Project links

Homepage

Project description

csq-quant — Causal Salience Quantization

CSQ is a post-training quantization method for large language models that uses gradient×activation causal importance scoring to identify which weights truly matter — then protects them from aggressive quantization.

Paper: CSQ: Closing the Perplexity Gap in 4-Bit LLM Quantization via Causal Salience Scoring and Co-Activation Graph Protection

Why CSQ?

Existing methods like AWQ use activation magnitude as a proxy for weight importance. We show this proxy agrees with true causal salience on only ~20% of top-5% critical weights — meaning AWQ aggressively quantizes 80% of the weights that actually matter most. CSQ fixes this.

Method	Avg bits	WikiText-2 PPL ↓	GSM8K ↑
FP32 baseline	32.00	—	—
RTN 4-bit	4.00	worst	worst
AWQ-style	4.12	better	better
CSQ (ours)	4.00	best	best

Results on LLaMA-3.2-1B. CSQ matches AWQ's bit budget while outperforming on perplexity and reasoning tasks.

Install

pip install csq-quant

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from csq import quantize, build_calibration_data

# Load your model
model     = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")

# Build calibration data (64 samples recommended)
calib_data = build_calibration_data(tokenizer, n=64, device="cuda")

# Quantize — that's it
model, info = quantize(model, calib_data, target_bits=4.0)

print(f"Avg bits: {info['avg_bits']:.3f}")
# → Avg bits: 4.001

# Model is a drop-in replacement — use exactly as before
outputs = model.generate(input_ids, max_new_tokens=100)

How it works

CSQ runs in three stages, all offline (done once before deployment):

Stage 1 — Causal salience profiling Runs N forward+backward passes on a calibration set. For each weight, computes |grad × weight| — a first-order Taylor approximation of the loss change from zeroing that weight. This is a true causal measure, not a proxy.

Stage 2 — Bit budget solver Binary searches over salience thresholds to find the fp16/int8/int4 split that achieves exactly your target bit-width (e.g. 4.000 bits). This is what makes CSQ's results directly comparable to AWQ and GPTQ at matched memory.

Stage 3 — Tiered quantization Applies the solved tiers per weight element:

Top ~5% by causal salience → keep fp16 (zero quantization loss)
Next ~20% → INT8 (minimal loss)
Bottom ~75% → INT4 (aggressive, but on weights that don't matter)

Advanced usage

from csq import compute_causal_salience, solve_bit_budget, apply_csq

# Run stages individually for more control
salience = compute_causal_salience(model, calib_data, verbose=True)
budget   = solve_bit_budget(salience, target_bits=4.0)
model, tier_stats = apply_csq(model, salience, budget)

# Inspect what happened
print(f"fp16 weights: {tier_stats['fp16']:,}")
print(f"int8 weights: {tier_stats['int8']:,}")
print(f"int4 weights: {tier_stats['int4']:,}")

Citation

@article{borkar2026csq,
  title   = {CSQ: Closing the Perplexity Gap in 4-Bit LLM Quantization
             via Causal Salience Scoring and Co-Activation Graph Protection},
  author  = {Borkar, Omdeep},
  journal = {arXiv preprint},
  year    = {2026}
}

License

MIT

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.4.1

Apr 11, 2026

0.3.9

Apr 10, 2026

0.3.8

Apr 10, 2026

0.3.7

Apr 10, 2026

0.3.6

Apr 10, 2026

0.3.5

Apr 10, 2026

0.3.4

Apr 10, 2026

0.3.3

Apr 10, 2026

0.3.2

Apr 10, 2026

0.3.1

Apr 10, 2026

0.3.0

Apr 10, 2026

0.2.9

Apr 10, 2026

0.2.8

Apr 10, 2026

0.2.7

Apr 10, 2026

0.2.6

Apr 7, 2026

0.2.5

Apr 7, 2026

0.2.4

Apr 6, 2026

0.2.3

Apr 6, 2026

0.2.2

Apr 6, 2026

0.2.1

Apr 6, 2026

0.2.0

Apr 5, 2026

0.1.3

Apr 5, 2026

0.1.2

Apr 5, 2026

0.1.1

Apr 5, 2026

This version

0.1.0

Apr 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

csaq_quant-0.1.0.tar.gz (11.5 kB view details)

Uploaded Apr 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

csaq_quant-0.1.0-py3-none-any.whl (10.8 kB view details)

Uploaded Apr 5, 2026 Python 3

File details

Details for the file csaq_quant-0.1.0.tar.gz.

File metadata

Download URL: csaq_quant-0.1.0.tar.gz
Upload date: Apr 5, 2026
Size: 11.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.4

File hashes

Hashes for csaq_quant-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`1ddbc7ae2139e70902ff785c510f2e7c900dee74105d6b5ac4b94f0314d3be0e`
MD5	`5f926d0adf2d8d46dbf76b6171904158`
BLAKE2b-256	`a0000a83b9e690b50c5ce510bd742fabbe2ee942bc4766a5b2d84011dc7d9a92`

See more details on using hashes here.

File details

Details for the file csaq_quant-0.1.0-py3-none-any.whl.

File metadata

Download URL: csaq_quant-0.1.0-py3-none-any.whl
Upload date: Apr 5, 2026
Size: 10.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.4

File hashes

Hashes for csaq_quant-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`21783f1c3b97cc9afcc8fc4b6cbb8f4ac754a66c8a381a0def9a6bed27dd5a84`
MD5	`7c0a813d7625a7000cad90db74cd0711`
BLAKE2b-256	`1f8bc6b1ff1b5639be9800377ed25763d5a4f68c7fe5ba869ca9370c0de36ae5`

See more details on using hashes here.

csaq-quant 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

csq-quant — Causal Salience Quantization

Why CSQ?

Install

Usage

How it works

Advanced usage

Citation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes