Library for utilization of compressed safetensors of neural network models

Project description

compressed-tensors

The compressed-tensors library extends the safetensors format, providing a versatile and efficient way to store and manage compressed tensor data. This library supports various quantization and sparsity schemes, making it a unified format for handling different model optimizations like GPTQ, AWQ, SmoothQuant, INT8, FP8, SparseGPT, and more.

Why `compressed-tensors`?

As model compression becomes increasingly important for efficient deployment of LLMs, the landscape of quantization and compression techniques has become increasingly fragmented. Each method often comes with its own storage format and loading procedures, making it challenging to work with multiple techniques or switch between them. compressed-tensors addresses this by providing a single, extensible format that can represent a wide variety of compression schemes.

Unified Checkpoint Format: Supports various compression schemes in a single, consistent format.
Wide Compatibility: Works with popular quantization methods like GPTQ, SmoothQuant, and FP8. See llm-compressor
Flexible Quantization Support:
- Weight-only quantization (e.g., W4A16, W8A16, WnA16)
- Activation quantization (e.g., W8A8)
- KV cache quantization
- Non-uniform schemes (different layers can be quantized in different ways!)
Sparsity Support: Handles both unstructured and semi-structured (e.g., 2:4) sparsity patterns.
Open-Source Integration: Designed to work seamlessly with Hugging Face models and PyTorch.

This allows developers and researchers to easily experiment with composing different quantization methods, simplify model deployment pipelines, and reduce the overhead of supporting multiple compression formats in inference engines.

Installation

From PyPI

Stable release:

pip install compressed-tensors

Nightly release:

pip install compressed-tensors-nightly

From Source

git clone https://github.com/neuralmagic/compressed-tensors
cd compressed-tensors
pip install -e .

Getting started

Saving/Loading Compressed Tensors (Bitmask Compression)

The function save_compressed uses the compression_format argument to apply compression to tensors. The function load_compressed reverses the process: converts the compressed weights on disk to decompressed weights in device memory.

from compressed_tensors import save_compressed, load_compressed, BitmaskConfig
from torch import Tensor
from typing import Dict

# the example BitmaskConfig method efficiently compresses 
# tensors with large number of zero entries 
compression_config = BitmaskConfig()

tensors: Dict[str, Tensor] = {"tensor_1": Tensor(
    [[0.0, 0.0, 0.0], 
     [1.0, 1.0, 1.0]]
)}
# compress tensors using BitmaskConfig compression format (save them efficiently on disk)
save_compressed(tensors, "model.safetensors", compression_format=compression_config.format)

# decompress tensors (load_compressed returns a generator for memory efficiency)
decompressed_tensors = {}
for tensor_name, tensor in load_compressed("model.safetensors", compression_config = compression_config):
    decompressed_tensors[tensor_name] = tensor

Saving/Loading Compressed Models (Bitmask Compression)

We can apply bitmask compression to a whole model. For more detailed example see example directory.

from compressed_tensors import save_compressed_model, load_compressed, BitmaskConfig
from transformers import AutoModelForCausalLM

model_name = "neuralmagic/llama2.c-stories110M-pruned50"
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto")

original_state_dict = model.state_dict()

compression_config = BitmaskConfig()

# save compressed model weights
save_compressed_model(model, "compressed_model.safetensors", compression_format=compression_config.format)

# load compressed model weights (`dict` turns generator into a dictionary)
state_dict = dict(load_compressed("compressed_model.safetensors", compression_config))

For more in-depth tutorial on bitmask compression, refer to the notebook.

Saving a Compressed Model with PTQ

We can use compressed-tensors to run basic post training quantization (PTQ) and save the quantized model compressed on disk

model_name = "TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T"
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="cuda:0", torch_dtype="auto")

config = QuantizationConfig.parse_file("./examples/bit_packing/int4_config.json")
config.quantization_status = QuantizationStatus.CALIBRATION
apply_quantization_config(model, config)

dataset = load_dataset("ptb_text_only")["train"]
tokenizer = AutoTokenizer.from_pretrained(model_name)

def tokenize_function(examples):
    return tokenizer(examples["sentence"], padding=False, truncation=True, max_length=1024)

tokenized_dataset = dataset.map(tokenize_function, batched=True)
data_loader = DataLoader(tokenized_dataset, batch_size=1, collate_fn=DefaultDataCollator())

with torch.no_grad():
    for idx, sample in tqdm(enumerate(data_loader), desc="Running calibration"):
        sample = {key: value.to(device) for key,value in sample.items()}
        _ = model(**sample)

        if idx >= 512:
            break

model.apply(freeze_module_quantization)
model.apply(compress_quantized_weights)

output_dir = "./ex_llama1.1b_w4a16_packed_quantize"
compressor = ModelCompressor(quantization_config=config)
compressed_state_dict = compressor.compress(model)
model.save_pretrained(output_dir, state_dict=compressed_state_dict)

For more in-depth tutorial on quantization compression, refer to the notebook.

Project details

Release history Release notifications | RSS feed

0.9.3.20250408

Apr 8, 2025

0.9.3.20250404

Apr 4, 2025

0.9.3.20250403

Apr 3, 2025

0.9.2.20250402

Apr 2, 2025

0.9.2.20250401

Apr 1, 2025

0.9.2.20250331

Mar 31, 2025

0.9.2.20250330

Mar 30, 2025

0.9.2.20250329

Mar 29, 2025

0.9.2.20250328

Mar 28, 2025

0.9.2.20250326

Mar 26, 2025

0.9.2.20250325

Mar 25, 2025

0.9.2.20250324

Mar 24, 2025

0.9.2.20250322

Mar 22, 2025

0.9.2.20250321

Mar 21, 2025

0.9.2.20250320

Mar 20, 2025

0.9.2.20250319

Mar 19, 2025

0.9.2.20250318

Mar 18, 2025

0.9.2.20250317

Mar 17, 2025

0.9.2.20250316

Mar 16, 2025

0.9.2.20250315

Mar 15, 2025

0.9.2.20250314

Mar 14, 2025

0.9.2.20250313

Mar 13, 2025

0.9.2.20250312

Mar 12, 2025

0.9.2.20250311

Mar 11, 2025

0.9.2.20250309

Mar 10, 2025

0.9.2.20250307

Mar 7, 2025

0.9.2.20250306

Mar 6, 2025

0.9.2.20250305

Mar 5, 2025

0.9.2.20250304

Mar 4, 2025

0.9.2.20250303

Mar 3, 2025

0.9.2.20250302

Mar 2, 2025

0.9.2.20250301

Mar 1, 2025

0.9.2.20250228

Feb 28, 2025

0.9.2.20250227

Feb 27, 2025

0.9.2.20250226

Feb 26, 2025

0.9.2.20250225

Feb 25, 2025

0.9.2.20250224

Feb 24, 2025

0.9.2.20250220

Feb 20, 2025

0.9.2.20250219

Feb 19, 2025

0.9.1.20250218

Feb 18, 2025

0.9.1.20250217

Feb 17, 2025

0.9.1.20250216

Feb 16, 2025

0.9.1.20250215

Feb 15, 2025

0.9.1.20250214

Feb 14, 2025

0.9.1.20250213

Feb 13, 2025

0.9.1.20250212

Feb 12, 2025

0.9.1.20250211

Feb 11, 2025

0.9.1.20250210

Feb 10, 2025

0.9.1.20250209

Feb 9, 2025

0.9.1.20250208

Feb 8, 2025

0.9.1.20250207

Feb 7, 2025

0.9.1.20250206

Feb 6, 2025

0.9.1.20250205

Feb 5, 2025

0.9.1.20250204

Feb 4, 2025

0.9.1.20250203

Feb 3, 2025

0.9.1.20250202

Feb 2, 2025

0.9.1.20250201

Feb 1, 2025

0.9.1.20250129

Jan 29, 2025

0.9.1.20250128

Jan 28, 2025

0.9.1.20250127

Jan 27, 2025

0.9.1.20250126

Jan 26, 2025

0.9.1.20250124

Jan 24, 2025

0.9.1.20250123

Jan 23, 2025

0.9.0.20250123

Jan 23, 2025

0.9.0.20250122

Jan 22, 2025

0.9.0.20250121

Jan 21, 2025

0.9.0.20250120

Jan 20, 2025

0.9.0.20250119

Jan 19, 2025

0.9.0.20250117

Jan 17, 2025

0.9.0.20250116

Jan 16, 2025

0.9.0.20250115

Jan 15, 2025

0.8.1.20250112

Jan 12, 2025

0.8.1.20250111

Jan 11, 2025

0.8.1.20250110

Jan 10, 2025

0.8.1.20250109

Jan 9, 2025

0.8.1.20250108

Jan 8, 2025

0.8.1.20250107

Jan 7, 2025

0.8.1.20250106

Jan 6, 2025

0.8.1.20250105

Jan 5, 2025

0.8.1.20250104

Jan 4, 2025

0.8.1.20250103

Jan 3, 2025

0.8.1.20250102

Jan 2, 2025

0.8.1.20250101

Jan 1, 2025

0.8.1.20241231

Dec 31, 2024

0.8.1.20241230

Dec 30, 2024

This version

0.8.1.20241229

Dec 29, 2024

0.8.1.20241228

Dec 28, 2024

0.8.1.20241226

Dec 26, 2024

0.8.1.20241225

Dec 25, 2024

0.8.1.20241223

Dec 23, 2024

0.8.1.20241220

Dec 20, 2024

0.8.1.20241219

Dec 19, 2024

0.8.1.20241218

Dec 18, 2024

0.8.1.20241217

Dec 17, 2024

0.8.1.20241216

Dec 16, 2024

0.8.1.20241215

Dec 15, 2024

0.8.1.20241214

Dec 14, 2024

0.8.1.20241213

Dec 13, 2024

0.8.1.20241212

Dec 12, 2024

0.8.1.20241211

Dec 11, 2024

0.8.0.20241211

Dec 11, 2024

0.8.0.20241210

Dec 10, 2024

0.8.0.20241209

Dec 9, 2024

0.8.0.20241208

Dec 8, 2024

0.8.0.20241207

Dec 7, 2024

0.8.0.20241206

Dec 6, 2024

0.8.0.20241205

Dec 5, 2024

0.8.0.20241204

Dec 4, 2024

0.8.0.20241203

Dec 3, 2024

0.8.0.20241202

Dec 2, 2024

0.8.0.20241201

Dec 1, 2024

0.8.0.20241130

Nov 30, 2024

0.8.0.20241129

Nov 29, 2024

0.8.0.20241128

Nov 28, 2024

0.8.0.20241127

Nov 27, 2024

0.8.0.20241126

Nov 26, 2024

0.8.0.20241125

Nov 25, 2024

0.8.0.20241124

Nov 24, 2024

0.8.0.20241123

Nov 23, 2024

0.8.0.20241122

Nov 22, 2024

0.8.0.20241121

Nov 21, 2024

0.8.0.20241120

Nov 20, 2024

0.8.0.20241119

Nov 19, 2024

0.8.0.20241118

Nov 18, 2024

0.8.0.20241117

Nov 17, 2024

0.8.0.20241116

Nov 16, 2024

0.8.0.20241115

Nov 15, 2024

0.8.0.20241114

Nov 14, 2024

0.8.0.20241113

Nov 13, 2024

0.8.0.20241112

Nov 12, 2024

0.7.1.20241112

Nov 12, 2024

0.7.1.20241111

Nov 11, 2024

0.7.1.20241110

Nov 10, 2024

0.7.1.20241109

Nov 9, 2024

0.7.1.20241108

Nov 8, 2024

0.7.1.20241107

Nov 7, 2024

0.7.1.20241106

Nov 6, 2024

0.7.1.20241104

Nov 4, 2024

0.7.1.20241103

Nov 3, 2024

0.7.1.20241102

Nov 2, 2024

0.7.1.20241101

Nov 1, 2024

0.7.1.20241031

Oct 31, 2024

0.7.1.20241030

Oct 30, 2024

0.7.1.20241029

Oct 29, 2024

0.7.1.20241028

Oct 28, 2024

0.7.1.20241027

Oct 27, 2024

0.7.1.20241026

Oct 26, 2024

0.7.1.20241025

Oct 25, 2024

0.7.1.20241024

Oct 24, 2024

0.7.1.20241023

Oct 23, 2024

0.7.1.20241022

Oct 22, 2024

0.7.1.20241021

Oct 21, 2024

0.7.1.20241020

Oct 20, 2024

0.7.1.20241018

Oct 18, 2024

0.7.1.20241017

Oct 17, 2024

0.7.0.20241016

Oct 16, 2024

0.7.0.20241015

Oct 15, 2024

0.7.0.20241014

Oct 14, 2024

0.7.0.20241013

Oct 13, 2024

0.7.0.20241012

Oct 12, 2024

0.7.0.20241011

Oct 11, 2024

0.7.0.20241010

Oct 10, 2024

0.7.0.20241009

Oct 9, 2024

0.6.0.20241008

Oct 8, 2024

0.6.0.20241007

Oct 7, 2024

0.6.0.20241006

Oct 6, 2024

0.6.0.20241005

Oct 5, 2024

0.6.0.20241004

Oct 4, 2024

0.6.0.20240930

Sep 30, 2024

0.6.0.20240929

Sep 29, 2024

0.6.0.20240928

Sep 28, 2024

0.6.0.20240926

Sep 26, 2024

0.6.0.20240925

Sep 25, 2024

0.6.0.20240924

Sep 24, 2024

0.6.0.20240923

Sep 23, 2024

0.6.0.20240922

Sep 22, 2024

0.6.0.20240921

Sep 21, 2024

0.6.0.20240920

Sep 20, 2024

0.6.0.20240919

Sep 19, 2024

0.6.0.20240918

Sep 18, 2024

0.5.0.20240912

Sep 12, 2024

0.5.0.20240911

Sep 11, 2024

0.5.0.20240910

Sep 10, 2024

0.5.0.20240909

Sep 9, 2024

0.5.0.20240908

Sep 8, 2024

0.5.0.20240907

Sep 7, 2024

0.5.0.20240906

Sep 6, 2024

0.5.0.20240905

Sep 5, 2024

0.5.0.20240904

Sep 4, 2024

0.5.0.20240903

Sep 3, 2024

0.5.0.20240902

Sep 2, 2024

0.5.0.20240901

Sep 1, 2024

0.5.0.20240831

Aug 31, 2024

0.5.0.20240830

Aug 30, 2024

0.5.0.20240829

Aug 29, 2024

0.5.0.20240814

Aug 14, 2024

0.5.0.20240813

Aug 13, 2024

0.5.0.20240812

Aug 12, 2024

0.5.0.20240811

Aug 11, 2024

0.5.0.20240810

Aug 10, 2024

0.5.0.20240809

Aug 9, 2024

0.5.0.20240808

Aug 8, 2024

0.5.0.20240807

Aug 7, 2024

0.5.0.20240806

Aug 6, 2024

0.5.0.20240805

Aug 5, 2024

0.5.0.20240804

Aug 4, 2024

0.5.0.20240803

Aug 3, 2024

0.5.0.20240802

Aug 2, 2024

0.5.0.20240801

Aug 1, 2024

0.4.0.20240731

Jul 31, 2024

0.4.0.20240722

Jul 22, 2024

0.4.0.20240721

Jul 21, 2024

0.4.0.20240720

Jul 20, 2024

0.4.0.20240719

Jul 19, 2024

0.4.0.20240718

Jul 18, 2024

0.4.0.20240717

Jul 17, 2024

0.4.0.20240716

Jul 16, 2024

0.4.0.20240715

Jul 15, 2024

0.4.0.20240714

Jul 14, 2024

0.4.0.20240713

Jul 13, 2024

0.4.0.20240712

Jul 12, 2024

0.4.0.20240711

Jul 11, 2024

0.4.0.20240710

Jul 10, 2024

0.4.0.20240709

Jul 9, 2024

0.4.0.20240708

Jul 8, 2024

0.4.0.20240707

Jul 7, 2024

0.4.0.20240706

Jul 6, 2024

0.4.0.20240705

Jul 5, 2024

0.4.0.20240704

Jul 4, 2024

0.4.0.20240703

Jul 3, 2024

0.4.0.20240702

Jul 2, 2024

0.4.0.20240701

Jul 1, 2024

0.4.0.20240630

Jun 30, 2024

0.4.0.20240629

Jun 29, 2024

0.4.0.20240628

Jun 28, 2024

0.4.0.20240627

Jun 27, 2024

0.4.0.20240626

Jun 26, 2024

0.4.0.20240623

Jun 23, 2024

0.4.0.20240622

Jun 22, 2024

0.4.0.20240621

Jun 21, 2024

0.4.0.20240620

Jun 20, 2024

0.4.0.20240619

Jun 19, 2024

0.4.0.20240618

Jun 18, 2024

0.4.0.20240617

Jun 17, 2024

0.4.0.20240616

Jun 16, 2024

0.4.0.20240615

Jun 15, 2024

0.4.0.20240614

Jun 14, 2024

0.4.0.20240613

Jun 13, 2024

0.3.3.20240612

Jun 12, 2024

0.3.3.20240611

Jun 11, 2024

0.3.3.20240610

Jun 10, 2024

0.3.3.20240609

Jun 9, 2024

0.3.3.20240608

Jun 8, 2024

0.3.3.20240607

Jun 7, 2024

0.3.3.20240606

Jun 6, 2024

0.3.3.20240605

Jun 5, 2024

0.3.3.20240604

Jun 4, 2024

0.3.3.20240603

Jun 3, 2024

0.3.3.20240602

Jun 2, 2024

0.3.3.20240601

Jun 1, 2024

0.3.3.20240531

May 31, 2024

0.3.3.20240530

May 30, 2024

0.3.3.20240529

May 29, 2024

0.3.3.20240528

May 28, 2024

0.3.3.20240527

May 27, 2024

0.3.3.20240526

May 26, 2024

0.3.3.20240525

May 25, 2024

0.3.3.20240524

May 24, 2024

0.3.3.20240523

May 23, 2024

0.3.3.20240522

May 22, 2024

0.3.3.20240521

May 21, 2024

0.3.3.20240520

May 20, 2024

0.3.3.20240519

May 19, 2024

0.3.3.20240518

May 18, 2024

0.3.3.20240517

May 17, 2024

0.3.3.20240516

May 16, 2024

0.3.3.20240514

May 14, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

compressed-tensors-nightly-0.8.1.20241229.tar.gz (58.8 kB view details)

Uploaded Dec 29, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

compressed_tensors_nightly-0.8.1.20241229-py3-none-any.whl (90.0 kB view details)

Uploaded Dec 29, 2024 Python 3

File details

Details for the file compressed-tensors-nightly-0.8.1.20241229.tar.gz.

File metadata

Download URL: compressed-tensors-nightly-0.8.1.20241229.tar.gz
Upload date: Dec 29, 2024
Size: 58.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.10.12

File hashes

Hashes for compressed-tensors-nightly-0.8.1.20241229.tar.gz
Algorithm	Hash digest
SHA256	`57e3a9ab93862cc2b5fba413088a9620e5a3c473a191974851a2f7ce56079e80`
MD5	`9ae003fa1689a1f39f58e278b973dea2`
BLAKE2b-256	`cda6f759e167776654c2c09e7ae613e91da940e8a4274cebf69a79b3991c90f2`

See more details on using hashes here.

File details

Details for the file compressed_tensors_nightly-0.8.1.20241229-py3-none-any.whl.

File metadata

Download URL: compressed_tensors_nightly-0.8.1.20241229-py3-none-any.whl
Upload date: Dec 29, 2024
Size: 90.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.10.12

File hashes

Hashes for compressed_tensors_nightly-0.8.1.20241229-py3-none-any.whl
Algorithm	Hash digest
SHA256	`649d3c8836aee8ab39ac4a6fcf4fa42fc25af0742e85cbdd32cb503059522157`
MD5	`9f272b6638b73e69d0a22611a226f1cc`
BLAKE2b-256	`3ee432f1d74d104d5b4b7451c31bd04141ddfb5d7c7e99549d347812895b5a14`

See more details on using hashes here.

compressed-tensors-nightly 0.8.1.20241229

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

compressed-tensors

Why `compressed-tensors`?

Installation

From PyPI

From Source

Getting started

Saving/Loading Compressed Tensors (Bitmask Compression)

Saving/Loading Compressed Models (Bitmask Compression)

Saving a Compressed Model with PTQ

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

compressed-tensors-nightly 0.8.1.20241229

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

compressed-tensors

Why compressed-tensors?

Installation

From PyPI

From Source

Getting started

Saving/Loading Compressed Tensors (Bitmask Compression)

Saving/Loading Compressed Models (Bitmask Compression)

Saving a Compressed Model with PTQ

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Why `compressed-tensors`?