Method to get a words probability with fixes from How to Compute the Probability of a Word.
Project description
probability-of-a-word
Code to compute a word's probability using the fixes from "How to Compute the Probability of a Word"
Installation
You can install WordsProbability directly from PyPI:
pip install wordsprobability
Or from source:
git clone git@github.com:tpimentelms/probability-of-a-word.git
cd probability-of-a-word
pip install -e .
Dependencies
WordsProbability has the following requirements:
Usage
Basic Usage
Install this repository. Then run:
$ wordsprobability --model pythia-70m --input examples/abstract.txt --output temp.tsv
The input must be a txt file, with one sequence per line.
The output will be a tsv file with a word per row with its respective computed surprisal
values.
To also get computed surprisal_buggy
values (without our paper's correction) use the optional flag --return-buggy-surprisals
.
Currently, supported models are: pythia-70m
, pythia-160m
, pythia-410m
, pythia-14b
, pythia-28b
, pythia-69b
, pythia-120b
, gpt2-small
, gpt2-medium
, gpt2-large
, gpt2-xl
.
The code
Using in other Applications
Import wordsprobability in your application and get surprisals with:
from wordsprobability import get_surprisal_per_word
df = get_surprisal_per_word(text='Hello world! Who are you???\nWho am I?', model_name='pythia-70m')
Extra Information
Citation
If this code or the paper were usefull to you, consider citing it:
@article{pimentel-etal-2024-howto,
title = "How to Compute the Probability of a Word",
author = "Pimentel, Tiago and
Meister, Clara",
year = "2024",
eprint = {2406.14561},
archivePrefix = {arXiv},
primaryClass = {cs.CL},
url = {https://arxiv.org/abs/2406.14561},
journal = "arXiv preprint arXiv:2406.14561",
}
Contact
To ask questions or report problems, please open an issue.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file wordsprobability-0.17.tar.gz
.
File metadata
- Download URL: wordsprobability-0.17.tar.gz
- Upload date:
- Size: 8.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9554a8d84e98c414acd44465b699ccd91bf6f3e20de7ef074a35232b3fffa9e9 |
|
MD5 | 2fdc9b9ad8c1d43fb639da821dfe6031 |
|
BLAKE2b-256 | 72622a99ec06452435a991282fe83d551035111c8a13d2b9400fd494f146da55 |
File details
Details for the file wordsprobability-0.17-py3-none-any.whl
.
File metadata
- Download URL: wordsprobability-0.17-py3-none-any.whl
- Upload date:
- Size: 8.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7a4bc6ad27160bc5795d1e17aa379c26d0760691101a2517f4eb5b4cb46f928b |
|
MD5 | 096909062635a4dfafc8ed12fe1b7b0b |
|
BLAKE2b-256 | 9b5f43a3c191154aa19ec08acfe53db14ff4c866c0f9774091cea97376ce6fb4 |