Inference-time adaptive token vocabularies for LLMs
Project description
zip2zip: Inference-Time Adaptive Vocabularies for Language Models via Token Compression
zip2zip enables inference-time adaptive token vocabularies for large language models (LLMs). It allows vocabularies to be dynamically augmented at inference time, leading to reduced decoding steps and faster inference.
Features
- Dynamic vocabulary adaptation during inference
- LZW-based token compression
- Support for various encoder configurations
- Integration with Hugging Face's transformers library
- Compatible with PEFT (Parameter-Efficient Fine-Tuning) models
Installation
You can install zip2zip using pip:
pip install zip2zip
Usage
Same API as Hugging Face
| zip2zip | Corresponding HF class |
|---|---|
| Zip2ZipModel | AutoModelForCausalLM |
| Zip2ZipTokenizer | AutoTokenizer |
| Zip2ZipConfig | AutoConfig |
| Zip2ZipModel.from_pretrained | AutoModelForCausalLM.from_pretrained |
| Zip2ZipTokenizer.from_pretrained | AutoTokenizer.from_pretrained |
| Zip2ZipConfig.from_pretrained | AutoConfig.from_pretrained |
Pretrained model weights
| Size | Model | HF Hub |
|---|---|---|
| 3.8B | Phi-3.5-mini-instruct-v0.1 | epfl-dlab/zip2zip-Phi-3.5-mini-instruct-v0.1 |
| 14B | Llama-3.1-8B-Instruct-v0.1 | epfl-dlab/zip2zip-Phi-3-medium-instruct-v0.1 |
| ... | ... | epfl-dlab/zip2zip-models |
Run a pretrained model
import torch
from zip2zip import Zip2ZipModel, Zip2ZipTokenizer
pretrained_model_url = "epfl-dlab/zip2zip-Phi-3.5-mini-instruct-v0.1"
device = "cuda" if torch.cuda.is_available() else "cpu"
# Initialize tokenizer
tokenizer = Zip2ZipTokenizer.from_pretrained(pretrained_model_url)
# Initialize model
model = Zip2ZipModel.from_pretrained(pretrained_model_url, device_map=device)
# Generate text
inputs = tokenizer("Write a MultiHeadAttention layer in PyTorch", return_tensors="pt").to(device)
outputs = model.generate(**inputs)
# Print the coloried
generated_text = tokenizer.color_decode(outputs)
You can apply quantization to the model to reduce the memory usage just as you would do with HF models.
model = Zip2ZipModel.from_pretrained(pretrained_model_url, device_map="auto", load_in_8bit=True)
Examples
We provide some examples in the examples folder.
Evaluation
We provide a script to evaluate the performance of the model, compatible with lm-evaluation-harness.
To run the evaluation, you need to install the zip2zip fork of lm-evaluation-harness (the original one is not compatible with zip2zip).
pip install git+https://github.com/epfl-dlab/zip2zip_lm_eval.git
Then, you can run the evaluation:
python bench/run_lm_eval.py
Citation
@misc{geng2025zip2zipinferencetimeadaptivevocabularies,
title={zip2zip: Inference-Time Adaptive Vocabularies for Language Models via Token Compression},
author={Saibo Geng and Nathan Ranchin and Yunzhen yao and Maxime Peyrard and Chris Wendler and Michael Gastpar and Robert West},
year={2025},
eprint={2506.01084},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2506.01084},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file zip2zip-0.1.2.tar.gz.
File metadata
- Download URL: zip2zip-0.1.2.tar.gz
- Upload date:
- Size: 31.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
203d41c279709e00f6689645eaeeb36f2cd02edfb2cbd00690a7362de3035378
|
|
| MD5 |
c5482a51168a0cfcc1cc1d111ea54e25
|
|
| BLAKE2b-256 |
c78a7b11a45b67c97e827c1c52c421079f3c8479964192ee8e5e6ea4dff8a9d0
|
File details
Details for the file zip2zip-0.1.2-py3-none-any.whl.
File metadata
- Download URL: zip2zip-0.1.2-py3-none-any.whl
- Upload date:
- Size: 34.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b8d387519323ffb75d592198c0d81c118266d2b7dbb49a86110d9c69146faa9
|
|
| MD5 |
4ece8c867578c792593115e8aa7bbbcc
|
|
| BLAKE2b-256 |
d1db1b0e2944705c1b7a683cfa53323bb573a0e7cfa897fa1285fb0f33d9e8ef
|