Skip to main content

Method to get a words probability with fixes from How to Compute the Probability of a Word.

Project description

probability-of-a-word

CircleCI

Code to compute a word's probability using the fixes from "How to Compute the Probability of a Word"

Installation

You can install WordsProbability directly from PyPI:

pip install wordsprobability

Or from source:

git clone git@github.com:tpimentelms/probability-of-a-word.git
cd probability-of-a-word
pip install -e .

Dependencies

WordsProbability has the following requirements:

Usage

Basic Usage

Install this repository. Then run:

$ wordsprobability --model pythia-70m --input examples/abstract.txt --output temp.tsv

The input must be a txt file, with one sequence per line. The output will be a tsv file with a word per row with its respective computed surprisal values. To also get computed surprisal_buggy values (without our paper's correction) use the optional flag --return-buggy-surprisals. Currently, supported models are: pythia-70m, pythia-160m, pythia-410m, pythia-14b, pythia-28b, pythia-69b, pythia-120b, gpt2-small, gpt2-medium, gpt2-large, gpt2-xl. The code

Using in other Applications

Import wordsprobability in your application and get surprisals with:

    from wordsprobability import get_surprisal_per_word
    df = get_surprisal_per_word(text='Hello world! Who are you???\nWho am I?', model_name='pythia-70m')

Extra Information

Citation

If this code or the paper were usefull to you, consider citing it:

@article{pimentel-etal-2024-howto,
    title = "How to Compute the Probability of a Word",
    author = "Pimentel, Tiago and
    Meister, Clara",
    year = "2024",
    eprint = {2406.14561},
    archivePrefix = {arXiv},
    primaryClass = {cs.CL},
    url = {https://arxiv.org/abs/2406.14561},
    journal = "arXiv preprint arXiv:2406.14561",
}

Contact

To ask questions or report problems, please open an issue.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wordsprobability-0.17.tar.gz (8.0 kB view details)

Uploaded Source

Built Distribution

wordsprobability-0.17-py3-none-any.whl (8.2 kB view details)

Uploaded Python 3

File details

Details for the file wordsprobability-0.17.tar.gz.

File metadata

  • Download URL: wordsprobability-0.17.tar.gz
  • Upload date:
  • Size: 8.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.14

File hashes

Hashes for wordsprobability-0.17.tar.gz
Algorithm Hash digest
SHA256 9554a8d84e98c414acd44465b699ccd91bf6f3e20de7ef074a35232b3fffa9e9
MD5 2fdc9b9ad8c1d43fb639da821dfe6031
BLAKE2b-256 72622a99ec06452435a991282fe83d551035111c8a13d2b9400fd494f146da55

See more details on using hashes here.

File details

Details for the file wordsprobability-0.17-py3-none-any.whl.

File metadata

File hashes

Hashes for wordsprobability-0.17-py3-none-any.whl
Algorithm Hash digest
SHA256 7a4bc6ad27160bc5795d1e17aa379c26d0760691101a2517f4eb5b4cb46f928b
MD5 096909062635a4dfafc8ed12fe1b7b0b
BLAKE2b-256 9b5f43a3c191154aa19ec08acfe53db14ff4c866c0f9774091cea97376ce6fb4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page