A package for calculating perplexity using various language models
Project description
llmppl
llmppl is a Python package for calculating text perplexity using various language models, including GPT-3.5, Llama 2, RWKV, and Mixtral.
Installation
You can install this package via pip:
pip install llmppl
You need to install PyTorch first based on your system configuration. Use the following command to install PyTorch with CUDA 12.4 using pip or conda:
conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia
Usage
Here are some examples of how to use llmppl to calculate text perplexity.
GPT-3.5 Turbo
export OPENAI_API_KEY='YOUROPENAIAPIKEY'
from llmppl import GPTLogProb
model = GPTLogProb(model='gpt-3.5-turbo-instruct')
text = "The quick brown fox jumps over the lazy dog."
logprobs, tokens = model.get_logprobs(text)
print("Log probabilities:", logprobs)
Llama 2
from llmppl import Llama2PPL
llama = Llama2PPL(model_name="meta-llama/Llama-2-7b-hf")
text = "The quick brown fox jumps over the lazy dog."
ppl = llama.calculate_ppl(text)
print(f"Perplexity: {ppl}")
RWKV
from llmppl import RWKVPPL
rwkv = RWKVPPL(model_name="RWKV/rwkv-raven-7b")
text = "The quick brown fox jumps over the lazy dog."
ppl = rwkv.calculate_ppl(text)
print(f"Perplexity: {ppl}")
Perplexity Calculation for Core Models
The llmppl package supports perplexity (PPL) calculation using various language model types including Masked Language Models (MLM), Causal Language Models (CLM), and Encoder-Decoder models. Here are examples of how to use these models for perplexity calculation.
Masked Language Model (MLM) Perplexity
from llmppl import MLMPPL
mlm = MLMPPL(model_name='bert-base-uncased')
text = "The quick brown fox jumps over the lazy dog."
ppl = mlm.calculate_ppl(text)
print(f"Perplexity (MLM): {ppl}")
Causal Language Model (CLM) Perplexity
from llmppl import DecoderPPL
decoder = DecoderPPL(model_name='gpt2')
text = "The quick brown fox jumps over the lazy dog."
ppl = decoder.calculate_ppl(text)
print(f"Perplexity (CLM): {ppl}")
Encoder-Decoder Model Perplexity
from llmppl import EncoderDecoderPPL
enc_dec = EncoderDecoderPPL(model_name='t5-small')
input_text = "translate English to German: The quick brown fox jumps over the lazy dog."
output_text = "Der schnelle braune Fuchs springt über den faulen Hund."
ppl = enc_dec.calculate_ppl(input_text, output_text)
print(f"Perplexity (Encoder-Decoder): {ppl}")
General Perplexity Calculation via LLMPPL
from llmppl import LLMPPL
# Masked Language Model (MLM)
text = "The quick brown fox jumps over the lazy dog."
ppl = LLMPPL.get_perplexity(text, model_type='mlm', model_name='bert-base-uncased')
print(f"Perplexity (MLM): {ppl}")
# Causal Language Model (CLM)
ppl = LLMPPL.get_perplexity(text, model_type='clm', model_name='gpt2')
print(f"Perplexity (CLM): {ppl}")
# Encoder-Decoder Model
input_text = "translate English to German: The quick brown fox jumps over the lazy dog."
output_text = "Der schnelle braune Fuchs springt über den faulen Hund."
ppl = LLMPPL.get_perplexity(input_text, output_text, model_type='enc-dec', model_name='t5-small')
print(f"Perplexity (Encoder-Decoder): {ppl}")
Dependencies
This project requires the following Python packages:
torch==2.5.0transformers==4.45.2openai==0.28.0tqdm==4.66.5sentencepiece==0.2.0bitsandbytes==0.44.1accelerate==1.0.1protobuf==5.28.2tiktoken==0.8.0
Contributing
Contributions are welcome! If you have ideas or find bugs, feel free to submit an issue or a pull request.
Citation
If this package is helpful to your work, please consider citing the following paper:
Xu, Z., & Sheng, V. S. (2024, March). Detecting AI-Generated Code Assignments Using Perplexity of Large Language Models. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 38, No. 21, pp. 23155-23162).
License
This project is licensed under the MIT License. See the LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llmppl-0.1.3.tar.gz.
File metadata
- Download URL: llmppl-0.1.3.tar.gz
- Upload date:
- Size: 10.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ccb6275dbf3be0c56379a4bd71e4b6c1910ed553e888d32cf49cfeae3a2cb3bc
|
|
| MD5 |
42412c2aa4de4613d5fef6cc236fbf86
|
|
| BLAKE2b-256 |
a12aa544c60ce0de1b1c24ce816371f402af5aa3806c9ac54f0f8558ed18a950
|
File details
Details for the file llmppl-0.1.3-py3-none-any.whl.
File metadata
- Download URL: llmppl-0.1.3-py3-none-any.whl
- Upload date:
- Size: 12.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
51b11ebb03352ce92fa6d0eb80b1b868956c778ff8ac018d7df95a2cdc171267
|
|
| MD5 |
0ccfe29261226ae15e2751f178564c55
|
|
| BLAKE2b-256 |
14466ffa2a00ca6a4418d59f384eef1acc95a5fd0e8f684d44980b9ea5b70f82
|