E8 lattice codebook quantization for LLM weights

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Intended Audience
- Science/Research
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Artificial Intelligence

Project description

GLQ

Post-training weight quantization for LLMs using E8 lattice codebooks.

GLQ encodes weights into 8-dimensional E8 lattice points via nearest-neighbor lookup. A Randomized Hadamard Transform (RHT) makes the Hessian approximately diagonal so that Euclidean nearest-neighbor is near-optimal under the proxy loss.

Results

SmolLM3-3B-Base on WikiText-2 (NVIDIA A10G):

Method	Eff. BPW	Size (MB)	Perplexity	vs bf16
bf16	16.00	6150	7.90	1.00x
GLQ 4-bit	4.00	1538	8.11	1.03x
AWQ 4-bit	5.60	2152	8.15	1.03x
QuIP+GPTQ 4-bit	4.76	1829	8.17	1.03x
GLQ 3-bit	3.00	1153	8.91	1.13x
QuIP+GPTQ 3-bit	3.70	1423	9.30	1.18x
GLQ 2-bit	2.00	769	11.35	1.44x

GLQ uses a single global scale per layer rather than per-group scales, so effective bit widths match the nominal rate. Early results on one model — more benchmarks needed.

How it works

E8 lattice codebook: 65536 vectors from the first 7 shells of the E8 lattice. Each 8-weight group maps to a 16-bit index (2 bpw). For 3/4 bpw, a second-stage residual codebook adds 8 or 16 more bits.
Randomized Hadamard Transform (RHT): Random sign flips + Fast Walsh-Hadamard Transform applied to both weights and Hessian. This spreads weight magnitude evenly across dimensions, making the Hessian block-diagonal approximately proportional to identity. After RHT, Euclidean nearest-neighbor in the codebook is close to Hessian-optimal.
LDLQ error feedback: Block-LDL decomposition of the Hessian drives a sequential quantization sweep (like GPTQ but over 8-dim blocks instead of scalar columns). Quantization error from each block propagates forward to correct subsequent blocks.

Install

Requires Python 3.10+ and PyTorch 2.0+. Install PyTorch first (pytorch.org), then:

# Full install (includes transformers, datasets, etc. for glq-quantize CLI):
pip install 'glq[quantize]'

# Or minimal install (inference only, no quantization dependencies):
pip install glq

Triton (for the fused codebook kernel) is bundled with PyTorch on CUDA and will be used automatically.

Quickstart

Command line

glq-quantize \
    --model HuggingFaceTB/SmolLM2-360M \
    --output ./smollm2-glq-2bpw \
    --bpw 2 \
    --nsamples 128 \
    --device cuda

Python API

from glq import quantize

quantize(
    model_name="HuggingFaceTB/SmolLM2-360M",
    output_dir="./smollm2-glq-2bpw",
    bpw=2,
    nsamples=128,
    device="cuda",
)

Loading a quantized model

import glq.hf_integration  # registers GLQ with transformers
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("./smollm2-glq-2bpw", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("./smollm2-glq-2bpw")

inputs = tokenizer("The capital of France is", return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Bit widths

BPW	Encoding	Overhead
2	16-bit index per 8 weights	Global scale only
3	16-bit + 8-bit residual index per 8 weights	Global scale only
4	16-bit + 16-bit residual index per 8 weights	Global scale only

All bit widths use a single global scale per layer (no group-size parameter).

Acknowledgments

The RHT incoherence approach follows QuIP# (Tseng et al., 2024)
E8 lattice geometry from Conway & Sloane, Sphere Packings, Lattices and Groups
LDLQ error feedback from GPTQ (Frantar et al., 2022)

License

Apache 2.0

Project details

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Intended Audience
- Science/Research
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Artificial Intelligence

Release history Release notifications | RSS feed

0.5.0

May 31, 2026

0.3.5

May 18, 2026

0.3.4

May 18, 2026

0.3.3

May 17, 2026

0.3.2

May 17, 2026

0.3.1

May 17, 2026

0.3.0

May 16, 2026

0.2.20

May 13, 2026

0.2.19

May 12, 2026

0.2.18

May 6, 2026

0.2.17

May 6, 2026

0.2.16

May 5, 2026

0.2.15

May 2, 2026

0.2.14

May 1, 2026

0.2.13

Apr 25, 2026

0.2.12

Apr 25, 2026

0.2.11

Apr 18, 2026

0.2.10

Apr 16, 2026

0.2.9

Apr 15, 2026

0.2.8

Apr 3, 2026

0.2.7

Mar 22, 2026

0.2.6

Mar 21, 2026

0.2.5

Mar 21, 2026

0.2.2

Mar 21, 2026

0.2.1

Mar 19, 2026

0.2.0

Mar 18, 2026

0.1.9

Mar 16, 2026

0.1.8

Mar 15, 2026

0.1.7

Mar 14, 2026

0.1.6

Mar 14, 2026

0.1.5

Mar 14, 2026

0.1.4

Mar 14, 2026

0.1.3

Mar 12, 2026

0.1.2

Mar 10, 2026

This version

0.1.1

Mar 9, 2026

0.1.0

Mar 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

glq-0.1.1.tar.gz (247.4 kB view details)

Uploaded Mar 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

glq-0.1.1-py3-none-any.whl (264.4 kB view details)

Uploaded Mar 9, 2026 Python 3

File details

Details for the file glq-0.1.1.tar.gz.

File metadata

Download URL: glq-0.1.1.tar.gz
Upload date: Mar 9, 2026
Size: 247.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for glq-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`b77627563f5e5d8d53ea3e87166ddaee218b1e42108beb21c11a8ea712a78da0`
MD5	`ba5db45d9088a9e4489bc4fb15c924ab`
BLAKE2b-256	`b7db1acc7635cb3f6ae20080b5b089117c7e30e0c09bae36c18bbdfaceaf5617`

See more details on using hashes here.

Provenance

The following attestation bundles were made for glq-0.1.1.tar.gz:

Publisher: publish.yml on cnygaard/glq

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: glq-0.1.1.tar.gz
- Subject digest: b77627563f5e5d8d53ea3e87166ddaee218b1e42108beb21c11a8ea712a78da0
- Sigstore transparency entry: 1066620993
- Sigstore integration time: Mar 9, 2026
Source repository:
- Permalink: cnygaard/glq@669266f6b22801fc1485eba1a7e7360eef277a63
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/cnygaard
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@669266f6b22801fc1485eba1a7e7360eef277a63
- Trigger Event: push

File details

Details for the file glq-0.1.1-py3-none-any.whl.

File metadata

Download URL: glq-0.1.1-py3-none-any.whl
Upload date: Mar 9, 2026
Size: 264.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for glq-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`80d76944d8db8e5ea51a369e7305a4a2fbfc4e9c1d3fcede361aaf865766130e`
MD5	`1b44629ae7de8acd421a65453627093e`
BLAKE2b-256	`302ebe661f40465d0a364b9a2863cff8316450b8e7ba5d5581a6f038aba6db50`

See more details on using hashes here.

Provenance

The following attestation bundles were made for glq-0.1.1-py3-none-any.whl:

Publisher: publish.yml on cnygaard/glq

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: glq-0.1.1-py3-none-any.whl
- Subject digest: 80d76944d8db8e5ea51a369e7305a4a2fbfc4e9c1d3fcede361aaf865766130e
- Sigstore transparency entry: 1066621001
- Sigstore integration time: Mar 9, 2026
Source repository:
- Permalink: cnygaard/glq@669266f6b22801fc1485eba1a7e7360eef277a63
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/cnygaard
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@669266f6b22801fc1485eba1a7e7360eef277a63
- Trigger Event: push

glq 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

GLQ

Results

How it works

Install

Quickstart

Command line

Python API

Loading a quantized model

Bit widths

Acknowledgments

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance