Inference-time adaptive token vocabularies for LLMs

Project description

zip2zip: Inference-Time Adaptive Vocabularies for Language Models via Token Compression

Python 3.10+

zip2zip enables inference-time adaptive token vocabularies for large language models (LLMs). It allows vocabularies to be dynamically augmented at inference time, leading to reduced decoding steps and faster inference.

zip2zip decoding

Features

Dynamic vocabulary adaptation during inference
LZW-based token compression
Support for various encoder configurations
Integration with Hugging Face's transformers library
Compatible with PEFT (Parameter-Efficient Fine-Tuning) models

Installation

You can install zip2zip using pip:

pip install zip2zip

Usage

Same API as Hugging Face

zip2zip	Corresponding HF class
Zip2ZipModel	AutoModelForCausalLM
Zip2ZipTokenizer	AutoTokenizer
Zip2ZipConfig	AutoConfig
Zip2ZipModel.from_pretrained	AutoModelForCausalLM.from_pretrained
Zip2ZipTokenizer.from_pretrained	AutoTokenizer.from_pretrained
Zip2ZipConfig.from_pretrained	AutoConfig.from_pretrained

Pretrained model weights

Size	Model	HF Hub
3.8B	Phi-3.5-mini-instruct-v0.1	epfl-dlab/zip2zip-Phi-3.5-mini-instruct-v0.1
14B	Llama-3.1-8B-Instruct-v0.1	epfl-dlab/zip2zip-Phi-3-medium-instruct-v0.1
...	...	epfl-dlab/zip2zip-models

Run a pretrained model

import torch
from zip2zip import Zip2ZipModel, Zip2ZipTokenizer

pretrained_model_url = "epfl-dlab/zip2zip-Phi-3.5-mini-instruct-v0.1"

device = "cuda" if torch.cuda.is_available() else "cpu"

# Initialize tokenizer
tokenizer = Zip2ZipTokenizer.from_pretrained(pretrained_model_url)

# Initialize model
model = Zip2ZipModel.from_pretrained(pretrained_model_url, device_map=device)

# Generate text
inputs = tokenizer("Write a MultiHeadAttention layer in PyTorch", return_tensors="pt").to(device)
outputs = model.generate(**inputs)

# Print the coloried
generated_text = tokenizer.color_decode(outputs)

You can apply quantization to the model to reduce the memory usage just as you would do with HF models.

model = Zip2ZipModel.from_pretrained(pretrained_model_url, device_map="auto", load_in_8bit=True)

Examples

We provide some examples in the examples folder.

Evaluation

We provide a script to evaluate the performance of the model, compatible with lm-evaluation-harness.

To run the evaluation, you need to install the zip2zip fork of lm-evaluation-harness (the original one is not compatible with zip2zip).

pip install git+https://github.com/epfl-dlab/zip2zip_lm_eval.git

Then, you can run the evaluation:

python bench/run_lm_eval.py

Citation

@misc{geng2025zip2zipinferencetimeadaptivevocabularies,
      title={zip2zip: Inference-Time Adaptive Vocabularies for Language Models via Token Compression},
      author={Saibo Geng and Nathan Ranchin and Yunzhen yao and Maxime Peyrard and Chris Wendler and Michael Gastpar and Robert West},
      year={2025},
      eprint={2506.01084},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2506.01084},
}

Project details

Release history Release notifications | RSS feed

This version

0.1.2

Jul 14, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zip2zip-0.1.2.tar.gz (31.5 kB view details)

Uploaded Jul 14, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

zip2zip-0.1.2-py3-none-any.whl (34.3 kB view details)

Uploaded Jul 14, 2025 Python 3

File details

Details for the file zip2zip-0.1.2.tar.gz.

File metadata

Download URL: zip2zip-0.1.2.tar.gz
Upload date: Jul 14, 2025
Size: 31.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for zip2zip-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`203d41c279709e00f6689645eaeeb36f2cd02edfb2cbd00690a7362de3035378`
MD5	`c5482a51168a0cfcc1cc1d111ea54e25`
BLAKE2b-256	`c78a7b11a45b67c97e827c1c52c421079f3c8479964192ee8e5e6ea4dff8a9d0`

See more details on using hashes here.

File details

Details for the file zip2zip-0.1.2-py3-none-any.whl.

File metadata

Download URL: zip2zip-0.1.2-py3-none-any.whl
Upload date: Jul 14, 2025
Size: 34.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for zip2zip-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4b8d387519323ffb75d592198c0d81c118266d2b7dbb49a86110d9c69146faa9`
MD5	`4ece8c867578c792593115e8aa7bbbcc`
BLAKE2b-256	`d1db1b0e2944705c1b7a683cfa53323bb573a0e7cfa897fa1285fb0f33d9e8ef`

See more details on using hashes here.

zip2zip 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

zip2zip: Inference-Time Adaptive Vocabularies for Language Models via Token Compression

Features

Installation

Usage

Same API as Hugging Face

Pretrained model weights

Run a pretrained model

Examples

Evaluation

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes