Skip to main content

Find out where your model is perplexed!

Project description

perplexed

This library is based on the idea from Andrej Karpathy on understanding the failure cases of a model by looking at the worst predictions. Specifically, this library focuses on calculating the perplexity of Large Language Models (LLMs) such as GPT-2 and BERT. The idea is to calculate the perplexity of a model on a dataset at the per token level. This allows us to understand where the model is perplexed and where it is not. This is useful for debugging and understanding the model.

Install

pip install perplexed

How to use

Using the API

perplexed is designed to work with the HuggingFace ecosystem and is built on top of the transformers and datasets libraries. The API is designed to be simple and easy to use. The main function is perplexed which takes in a model, dataset, and tokenizer and returns a simple Counter object with the perplexity of each token in the dataset. Here is an example of how to use it:

from perplexed.core import perplexed

tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neo-125M")
tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-neo-125M")

dataset = load_dataset("wikitext", "wikitext-2-raw-v1", split="test").select(range(100))
# filter out empty strings
dataset = dataset.filter(lambda x: len(x["text"]) > 0)

perplexity_cnt = perplexed(model, dataset, tokenizer=tokenizer, column="text", batch_size=1, device="cpu")
perplexity_cnt.most_common(10)
Found cached dataset wikitext (/home/nathan/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126)
Loading cached processed dataset at /home/nathan/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126/cache-68eb731029328d8b.arrow
Loading cached processed dataset at /home/nathan/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126/cache-1c1cd85efcee4db8.arrow

[(' wired', 60983688.0),
 (' 768', 21569838.0),
 (' shatter', 12281687.0),
 (' unsett', 8289435.0),
 (' ignited', 6605209.0),
 (' Tanz', 4834899.0),
 (' Influence', 4153321.75),
 (' Career', 4064189.0),
 (' Television', 2325870.75),
 (' Moral', 2243574.5)]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

perplexed-0.0.1.tar.gz (9.9 kB view details)

Uploaded Source

Built Distribution

perplexed-0.0.1-py3-none-any.whl (9.4 kB view details)

Uploaded Python 3

File details

Details for the file perplexed-0.0.1.tar.gz.

File metadata

  • Download URL: perplexed-0.0.1.tar.gz
  • Upload date:
  • Size: 9.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.8

File hashes

Hashes for perplexed-0.0.1.tar.gz
Algorithm Hash digest
SHA256 7a282f893c2bf8460eb557d7cedabdfa896d29c50cf2db0813e22bb0b2a5feec
MD5 3b05307bad5ac46f25253bf36ad8a73e
BLAKE2b-256 86e12cb56e0fa6c2462ac4a0f37a8514f017189e70fcf8d23df48b19fa88c2db

See more details on using hashes here.

File details

Details for the file perplexed-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: perplexed-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 9.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.8

File hashes

Hashes for perplexed-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4830eddfcb2b32257a944f247052593c6c68ab51f41cf63809e8638badde4255
MD5 189b71e9dd0e0311e86c1b5b190d82ce
BLAKE2b-256 518564795628a4c6b2736d72c79eabe024d2bce5d06b8e77e2d257f05347677f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page