Skip to main content

A package for calculating perplexity using various language models

Project description

llmppl

llmppl is a Python package for calculating text perplexity using various language models, including GPT-3.5, Llama 2, RWKV, and Mixtral.

Installation

You can install this package via pip:

pip install llmppl

You need to install PyTorch first based on your system configuration. Use the following command to install PyTorch with CUDA 12.4 using pip or conda:

conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia

Usage

Here are some examples of how to use llmppl to calculate text perplexity.

GPT-3.5 Turbo

export OPENAI_API_KEY='YOUROPENAIAPIKEY'

from llmppl import GPTLogProb

model = GPTLogProb(model='gpt-3.5-turbo-instruct')
text = "The quick brown fox jumps over the lazy dog."
logprobs, tokens = model.get_logprobs(text)
print("Log probabilities:", logprobs)

Llama 2

from llmppl import Llama2PPL

llama = Llama2PPL(model_name="meta-llama/Llama-2-7b-hf")
text = "The quick brown fox jumps over the lazy dog."
ppl = llama.calculate_ppl(text)
print(f"Perplexity: {ppl}")

RWKV

from llmppl import RWKVPPL

rwkv = RWKVPPL(model_name="RWKV/rwkv-raven-7b")
text = "The quick brown fox jumps over the lazy dog."
ppl = rwkv.calculate_ppl(text)
print(f"Perplexity: {ppl}")

Perplexity Calculation for Core Models

The llmppl package supports perplexity (PPL) calculation using various language model types including Masked Language Models (MLM), Causal Language Models (CLM), and Encoder-Decoder models. Here are examples of how to use these models for perplexity calculation.

Masked Language Model (MLM) Perplexity

from llmppl import MLMPPL

mlm = MLMPPL(model_name='bert-base-uncased')
text = "The quick brown fox jumps over the lazy dog."
ppl = mlm.calculate_ppl(text)
print(f"Perplexity (MLM): {ppl}")

Causal Language Model (CLM) Perplexity

from llmppl import DecoderPPL

decoder = DecoderPPL(model_name='gpt2')
text = "The quick brown fox jumps over the lazy dog."
ppl = decoder.calculate_ppl(text)
print(f"Perplexity (CLM): {ppl}")

Encoder-Decoder Model Perplexity

from llmppl import EncoderDecoderPPL

enc_dec = EncoderDecoderPPL(model_name='t5-small')
input_text = "translate English to German: The quick brown fox jumps over the lazy dog."
output_text = "Der schnelle braune Fuchs springt über den faulen Hund."
ppl = enc_dec.calculate_ppl(input_text, output_text)
print(f"Perplexity (Encoder-Decoder): {ppl}")

General Perplexity Calculation via LLMPPL

from llmppl import LLMPPL

# Masked Language Model (MLM)
text = "The quick brown fox jumps over the lazy dog."
ppl = LLMPPL.get_perplexity(text, model_type='mlm', model_name='bert-base-uncased')
print(f"Perplexity (MLM): {ppl}")

# Causal Language Model (CLM)
ppl = LLMPPL.get_perplexity(text, model_type='clm', model_name='gpt2')
print(f"Perplexity (CLM): {ppl}")

# Encoder-Decoder Model
input_text = "translate English to German: The quick brown fox jumps over the lazy dog."
output_text = "Der schnelle braune Fuchs springt über den faulen Hund."
ppl = LLMPPL.get_perplexity(input_text, output_text, model_type='enc-dec', model_name='t5-small')
print(f"Perplexity (Encoder-Decoder): {ppl}")

Dependencies

This project requires the following Python packages:

  • torch==2.5.0
  • transformers==4.45.2
  • openai==0.28.0
  • tqdm==4.66.5
  • sentencepiece==0.2.0
  • bitsandbytes==0.44.1
  • accelerate==1.0.1
  • protobuf==5.28.2
  • tiktoken==0.8.0

Contributing

Contributions are welcome! If you have ideas or find bugs, feel free to submit an issue or a pull request.

Citation

If this package is helpful to your work, please consider citing the following paper:

Xu, Z., & Sheng, V. S. (2024, March). Detecting AI-Generated Code Assignments Using Perplexity of Large Language Models. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 38, No. 21, pp. 23155-23162).

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmppl-0.1.3.tar.gz (10.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llmppl-0.1.3-py3-none-any.whl (12.2 kB view details)

Uploaded Python 3

File details

Details for the file llmppl-0.1.3.tar.gz.

File metadata

  • Download URL: llmppl-0.1.3.tar.gz
  • Upload date:
  • Size: 10.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.15

File hashes

Hashes for llmppl-0.1.3.tar.gz
Algorithm Hash digest
SHA256 ccb6275dbf3be0c56379a4bd71e4b6c1910ed553e888d32cf49cfeae3a2cb3bc
MD5 42412c2aa4de4613d5fef6cc236fbf86
BLAKE2b-256 a12aa544c60ce0de1b1c24ce816371f402af5aa3806c9ac54f0f8558ed18a950

See more details on using hashes here.

File details

Details for the file llmppl-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: llmppl-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 12.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.15

File hashes

Hashes for llmppl-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 51b11ebb03352ce92fa6d0eb80b1b868956c778ff8ac018d7df95a2cdc171267
MD5 0ccfe29261226ae15e2751f178564c55
BLAKE2b-256 14466ffa2a00ca6a4418d59f384eef1acc95a5fd0e8f684d44980b9ea5b70f82

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page