Method to get a words probability with fixes from How to Compute the Probability of a Word.
Project description
probability-of-a-word
Code to compute a word's probability using the fixes from "How to Compute the Probability of a Word"
Installation
You can install WordsProbability directly from PyPI:
pip install wordsprobability
Or from source:
git clone git@github.com:tpimentelms/probability-of-a-word.git
cd probability-of-a-word
pip install -e .
Dependencies
WordsProbability has the following requirements:
Usage
Basic Usage
Install this repository. Then run:
$ wordsprobability --model pythia-70m --input examples/abstract.txt --output temp.tsv
The input must be a txt file, with one sequence per line.
The output will be a tsv file with a word per row with its respective computed surprisal values.
To also get computed surprisal_buggy values (without our paper's correction) use the optional flag --return-buggy-surprisals.
Currently, supported models are: pythia-70m, pythia-160m, pythia-410m, pythia-14b, pythia-28b, pythia-69b, pythia-120b, gpt2-small, gpt2-medium, gpt2-large, gpt2-xl.
The code
Using in other Applications
Import wordsprobability in your application and get surprisals with:
from wordsprobability import get_surprisal_per_word
df = get_surprisal_per_word(text='Hello world! Who are you???\nWho am I?', model_name='pythia-70m')
Extra Information
Citation
If this code or the paper were usefull to you, consider citing it:
@article{pimentel-etal-2024-howto,
title = "How to Compute the Probability of a Word",
author = "Pimentel, Tiago and
Meister, Clara",
year = "2024",
eprint = {2406.14561},
archivePrefix = {arXiv},
primaryClass = {cs.CL},
url = {https://arxiv.org/abs/2406.14561},
journal = "arXiv preprint arXiv:2406.14561",
}
Contact
To ask questions or report problems, please open an issue.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file wordsprobability-0.17.tar.gz.
File metadata
- Download URL: wordsprobability-0.17.tar.gz
- Upload date:
- Size: 8.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9554a8d84e98c414acd44465b699ccd91bf6f3e20de7ef074a35232b3fffa9e9
|
|
| MD5 |
2fdc9b9ad8c1d43fb639da821dfe6031
|
|
| BLAKE2b-256 |
72622a99ec06452435a991282fe83d551035111c8a13d2b9400fd494f146da55
|
File details
Details for the file wordsprobability-0.17-py3-none-any.whl.
File metadata
- Download URL: wordsprobability-0.17-py3-none-any.whl
- Upload date:
- Size: 8.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7a4bc6ad27160bc5795d1e17aa379c26d0760691101a2517f4eb5b4cb46f928b
|
|
| MD5 |
096909062635a4dfafc8ed12fe1b7b0b
|
|
| BLAKE2b-256 |
9b5f43a3c191154aa19ec08acfe53db14ff4c866c0f9774091cea97376ce6fb4
|