Calculate the longppl of long-context LLMs

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

LongPPL

This repository is the official implementation for ICLR 2025 paper What is Wrong with Perplexity for Long-context Language Modeling?

Introduction

Handling long-context inputs is crucial for large language models (LLMs). While recent approaches have extended the context windows of LLMs and employed perplexity (PPL) as a standard evaluation metric, PPL has proven unreliable for assessing long-context capabilities. We find that PPL overlooks key tokens, which are essential for long-context understanding, by averaging across all tokens and thereby obscuring the true performance of models in long-context scenarios. To address this, we propose LongPPL, a novel metric that focuses on key tokens by employing a long-short context contrastive method to identify them. Additionally, we introduce LongCE (Long-context Cross-Entropy) loss, a re-weighting strategy for fine-tuning that prioritizes key tokens.

Our experiments demonstrate that LongPPL strongly correlates with performance on various long-context benchmarks (e.g., Pearson correlation of -0.96), significantly outperforming traditional PPL in predictive accuracy. Besides, experimental results also show that LongCE attains consistent improvements in a plug-and-play solution.

Requirements

Python 3.10 + Pytorch 2.3 + Transformers 4.45

pip install -r requirements.txt

LongPPL

The code support calculating LongPPL on customized LLMs and datasets. Please run:

pip install longppl

git clone https://github.com/PKU-ML/LongPPL.git
cd LongPPL
pip install -e .

and use the following code to calculate LongPPL:

from longppl import compute_longppl

output = compute_longppl(text, model, evaluator_model, tokenizer, evaluator_tokenizer)
print(output['longppl'])

Reproduce the paper

LongPPL

To reproduce the LongPPL experiments in our paper, please run:

cd perplexity
sh run_ppl.sh

The evaluation data can be downloaded from GovReport (tokenized). Here are our main results.

Models	LongPPL(Qwen-72B-Instruct)	LongPPL(Mistral Large 2)	LongPPL(Llama-3.1-8B)	PPL
Mixtral-8x7B	1.99	2.33	1.70	3.59
FILM-7B	2.28	2.81	1.95	4.35
Mistral-7B	2.48	3.10	2.11	4.14
Qwen1.5-14B	2.67	2.57	2.19	5.07
Qwen2-7B	2.66	2.48	2.16	4.82
Phi-3-small	2.66	2.58	2.28	5.29
CLEX-7B	3.28	3.95	2.74	4.04
Yi-6B	3.19	3.38	2.65	4.96
Yarn-7B	3.47	4.51	2.98	4.06

While perplexity shows almost no correlation to their long-context performance measured by the benchmarks (please refer to our paper), LongPPL demonstrates a strong correlation.

LongCE

To conduct long-context finetuning with LongCE, run accelerate config and enable DeepSpeed acceleration. deepspeed/zero3.json was the configuration file used for training.

cd finetune
sh train.sh

The training data can be downloaded from PG19 and Pile-arxiv. To run models with eabf, please downgrade the version of transformers to 4.37.0.

Evaluation on Long-context Benchmark

In the paper, we evaluate models on LongBench, LongEval and RULER. Please refer to the respective code repositories.

Citation

If you use our code, please cite

@article{fang2024wrong,
      title={What is Wrong with Perplexity for Long-context Language Modeling?}, 
      author={Lizhe Fang and Yifei Wang and Zhaoyang Liu and Chenheng Zhang and Stefanie Jegelka and Jinyang Gao and Bolin Ding and Yisen Wang},
      year={2024},
      journal={arXiv preprint arXiv:2410.23771}
}

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.3.0

Jul 24, 2025

0.1.3

Feb 21, 2025

0.1.2

Nov 15, 2024

0.1.1

Oct 28, 2024

0.1.0

Oct 27, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

longppl-0.3.0.tar.gz (9.6 kB view details)

Uploaded Jul 24, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

longppl-0.3.0-py3-none-any.whl (9.6 kB view details)

Uploaded Jul 24, 2025 Python 3

File details

Details for the file longppl-0.3.0.tar.gz.

File metadata

Download URL: longppl-0.3.0.tar.gz
Upload date: Jul 24, 2025
Size: 9.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for longppl-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`186cb4d6ca888b971702c9d6d2762b78355d949b5d9c8eebf28b774bcd7476b6`
MD5	`1ff1bc2ba523a8ebc3cdcdf920f49417`
BLAKE2b-256	`93a391a50a461babd20c617d147da97091a20b6ea9e0a5912f49cd696fc90fc5`

See more details on using hashes here.

File details

Details for the file longppl-0.3.0-py3-none-any.whl.

File metadata

Download URL: longppl-0.3.0-py3-none-any.whl
Upload date: Jul 24, 2025
Size: 9.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for longppl-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4764842acb4508378b2dd03d49c67e9d6ef5ca5c34308ca824e0da33a9a69db5`
MD5	`9d1351181b5078e19ac2cb844432cec3`
BLAKE2b-256	`2d5d7ad89fc82fdb315d00a8113b99e077b47a3fbc1d6e538cc7d54b3ee6d8e9`

See more details on using hashes here.

longppl 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

LongPPL

Introduction

Requirements

LongPPL

Reproduce the paper

LongPPL

LongCE

Evaluation on Long-context Benchmark

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes