Skip to main content

Inference-time adaptive token vocabularies for LLMs

Project description

zip2zip: Inference-Time Adaptive Vocabularies for Language Models via Token Compression

Python 3.10+ arXiv Hugging Face License

zip2zip enables inference-time adaptive token vocabularies for large language models (LLMs). It allows vocabularies to be dynamically augmented at inference time, leading to reduced decoding steps and faster inference.

zip2zip decoding

Features

  • Dynamic vocabulary adaptation during inference
  • LZW-based token compression
  • Support for various encoder configurations
  • Integration with Hugging Face's transformers library
  • Compatible with PEFT (Parameter-Efficient Fine-Tuning) models

Installation

You can install zip2zip using pip:

pip install zip2zip

Usage

Same API as Hugging Face

zip2zip Corresponding HF class
Zip2ZipModel AutoModelForCausalLM
Zip2ZipTokenizer AutoTokenizer
Zip2ZipConfig AutoConfig
Zip2ZipModel.from_pretrained AutoModelForCausalLM.from_pretrained
Zip2ZipTokenizer.from_pretrained AutoTokenizer.from_pretrained
Zip2ZipConfig.from_pretrained AutoConfig.from_pretrained

Pretrained model weights

Size Model HF Hub
3.8B Phi-3.5-mini-instruct-v0.1 epfl-dlab/zip2zip-Phi-3.5-mini-instruct-v0.1
14B Llama-3.1-8B-Instruct-v0.1 epfl-dlab/zip2zip-Phi-3-medium-instruct-v0.1
... ... epfl-dlab/zip2zip-models

Run a pretrained model

import torch
from zip2zip import Zip2ZipModel, Zip2ZipTokenizer

pretrained_model_url = "epfl-dlab/zip2zip-Phi-3.5-mini-instruct-v0.1"

device = "cuda" if torch.cuda.is_available() else "cpu"

# Initialize tokenizer
tokenizer = Zip2ZipTokenizer.from_pretrained(pretrained_model_url)

# Initialize model
model = Zip2ZipModel.from_pretrained(pretrained_model_url, device_map=device)

# Generate text
inputs = tokenizer("Write a MultiHeadAttention layer in PyTorch", return_tensors="pt").to(device)
outputs = model.generate(**inputs)

# Print the coloried
generated_text = tokenizer.color_decode(outputs)

You can apply quantization to the model to reduce the memory usage just as you would do with HF models.

model = Zip2ZipModel.from_pretrained(pretrained_model_url, device_map="auto", load_in_8bit=True)

Examples

We provide some examples in the examples folder.

Evaluation

We provide a script to evaluate the performance of the model, compatible with lm-evaluation-harness.

To run the evaluation, you need to install the zip2zip fork of lm-evaluation-harness (the original one is not compatible with zip2zip).

pip install git+https://github.com/epfl-dlab/zip2zip_lm_eval.git

Then, you can run the evaluation:

python bench/run_lm_eval.py

Citation

@misc{geng2025zip2zipinferencetimeadaptivevocabularies,
      title={zip2zip: Inference-Time Adaptive Vocabularies for Language Models via Token Compression},
      author={Saibo Geng and Nathan Ranchin and Yunzhen yao and Maxime Peyrard and Chris Wendler and Michael Gastpar and Robert West},
      year={2025},
      eprint={2506.01084},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2506.01084},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zip2zip-0.1.2.tar.gz (31.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zip2zip-0.1.2-py3-none-any.whl (34.3 kB view details)

Uploaded Python 3

File details

Details for the file zip2zip-0.1.2.tar.gz.

File metadata

  • Download URL: zip2zip-0.1.2.tar.gz
  • Upload date:
  • Size: 31.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for zip2zip-0.1.2.tar.gz
Algorithm Hash digest
SHA256 203d41c279709e00f6689645eaeeb36f2cd02edfb2cbd00690a7362de3035378
MD5 c5482a51168a0cfcc1cc1d111ea54e25
BLAKE2b-256 c78a7b11a45b67c97e827c1c52c421079f3c8479964192ee8e5e6ea4dff8a9d0

See more details on using hashes here.

File details

Details for the file zip2zip-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: zip2zip-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 34.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for zip2zip-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4b8d387519323ffb75d592198c0d81c118266d2b7dbb49a86110d9c69146faa9
MD5 4ece8c867578c792593115e8aa7bbbcc
BLAKE2b-256 d1db1b0e2944705c1b7a683cfa53323bb573a0e7cfa897fa1285fb0f33d9e8ef

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page