E8 lattice codebook quantization for LLM weights
Project description
GLQ
Post-training weight quantization for LLMs using E8 lattice codebooks.
GLQ encodes weights into 8-dimensional E8 lattice points via nearest-neighbor lookup. A Randomized Hadamard Transform (RHT) makes the Hessian approximately diagonal so that Euclidean nearest-neighbor is near-optimal under the proxy loss.
Results
SmolLM3-3B-Base on WikiText-2 (NVIDIA A10G):
| Method | Eff. BPW | Size (MB) | Perplexity | vs bf16 |
|---|---|---|---|---|
| bf16 | 16.00 | 6150 | 7.90 | 1.00x |
| GLQ 4-bit | 4.00 | 1538 | 8.11 | 1.03x |
| AWQ 4-bit | 5.60 | 2152 | 8.15 | 1.03x |
| QuIP+GPTQ 4-bit | 4.76 | 1829 | 8.17 | 1.03x |
| GLQ 3-bit | 3.00 | 1153 | 8.91 | 1.13x |
| QuIP+GPTQ 3-bit | 3.70 | 1423 | 9.30 | 1.18x |
| GLQ 2-bit | 2.00 | 769 | 11.35 | 1.44x |
GLQ uses a single global scale per layer rather than per-group scales, so effective bit widths match the nominal rate. Early results on one model — more benchmarks needed.
How it works
-
E8 lattice codebook: 65536 vectors from the first 7 shells of the E8 lattice. Each 8-weight group maps to a 16-bit index (2 bpw). For 3/4 bpw, a second-stage residual codebook adds 8 or 16 more bits.
-
Randomized Hadamard Transform (RHT): Random sign flips + Fast Walsh-Hadamard Transform applied to both weights and Hessian. This spreads weight magnitude evenly across dimensions, making the Hessian block-diagonal approximately proportional to identity. After RHT, Euclidean nearest-neighbor in the codebook is close to Hessian-optimal.
-
LDLQ error feedback: Block-LDL decomposition of the Hessian drives a sequential quantization sweep (like GPTQ but over 8-dim blocks instead of scalar columns). Quantization error from each block propagates forward to correct subsequent blocks.
Install
Requires Python 3.10+ and PyTorch 2.0+. Install PyTorch first (pytorch.org), then:
# Core package (codebook + quantization):
pip install 'glq[quantize] @ git+https://github.com/cnygaard/glq.git'
# Or minimal install (no transformers/datasets):
pip install git+https://github.com/cnygaard/glq.git
Triton (for the fused codebook kernel) is bundled with PyTorch on CUDA and will be used automatically.
Quickstart
Command line
glq-quantize \
--model HuggingFaceTB/SmolLM2-360M \
--output ./smollm2-glq-2bpw \
--bpw 2 \
--nsamples 128 \
--device cuda
Python API
from glq import quantize
quantize(
model_name="HuggingFaceTB/SmolLM2-360M",
output_dir="./smollm2-glq-2bpw",
bpw=2,
nsamples=128,
device="cuda",
)
Loading a quantized model
import glq.hf_integration # registers GLQ with transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("./smollm2-glq-2bpw", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("./smollm2-glq-2bpw")
inputs = tokenizer("The capital of France is", return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Bit widths
| BPW | Encoding | Overhead |
|---|---|---|
| 2 | 16-bit index per 8 weights | Global scale only |
| 3 | 16-bit + 8-bit residual index per 8 weights | Global scale only |
| 4 | 16-bit + 16-bit residual index per 8 weights | Global scale only |
All bit widths use a single global scale per layer (no group-size parameter).
Acknowledgments
- The RHT incoherence approach follows QuIP# (Tseng et al., 2024)
- E8 lattice geometry from Conway & Sloane, Sphere Packings, Lattices and Groups
- LDLQ error feedback from GPTQ (Frantar et al., 2022)
License
Apache 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file glq-0.1.0.tar.gz.
File metadata
- Download URL: glq-0.1.0.tar.gz
- Upload date:
- Size: 247.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9fcd2e970cff1d3ce7bb69f2695cf84ee54ab9f10554f5243e64ac55943372a7
|
|
| MD5 |
b88e50b9174bf4dcd6d9ee8dbb18a3f2
|
|
| BLAKE2b-256 |
258774d8c919999f762cc865d27a2c8593c34c39c8221cc6c92a67a887515a0f
|
Provenance
The following attestation bundles were made for glq-0.1.0.tar.gz:
Publisher:
publish.yml on cnygaard/glq
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
glq-0.1.0.tar.gz -
Subject digest:
9fcd2e970cff1d3ce7bb69f2695cf84ee54ab9f10554f5243e64ac55943372a7 - Sigstore transparency entry: 1059839131
- Sigstore integration time:
-
Permalink:
cnygaard/glq@85579661fb27017428bc54583cd3a32e03e01c0b -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/cnygaard
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@85579661fb27017428bc54583cd3a32e03e01c0b -
Trigger Event:
release
-
Statement type:
File details
Details for the file glq-0.1.0-py3-none-any.whl.
File metadata
- Download URL: glq-0.1.0-py3-none-any.whl
- Upload date:
- Size: 264.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cc7133ce8ccebc3e76e3b4ea179f198092c30fd5fd019fbf50b1398d8cd32125
|
|
| MD5 |
17eab01d7df6144af66ff43712afe4ca
|
|
| BLAKE2b-256 |
48e10ba0eca97cd57e20e90db5fe366d8e0a473ab3da70dd6ce2dfead4436429
|
Provenance
The following attestation bundles were made for glq-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on cnygaard/glq
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
glq-0.1.0-py3-none-any.whl -
Subject digest:
cc7133ce8ccebc3e76e3b4ea179f198092c30fd5fd019fbf50b1398d8cd32125 - Sigstore transparency entry: 1059839133
- Sigstore integration time:
-
Permalink:
cnygaard/glq@85579661fb27017428bc54583cd3a32e03e01c0b -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/cnygaard
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@85579661fb27017428bc54583cd3a32e03e01c0b -
Trigger Event:
release
-
Statement type: