Skip to main content

Method to get a words probability with fixes from How to Compute the Probability of a Word.

Project description

probability-of-a-word

CircleCI

Code to compute a word's probability using the fixes from "How to Compute the Probability of a Word"

Installation

You can install WordsProbability directly from PyPI:

pip install wordsprobability

Or from source:

git clone git@github.com:tpimentelms/probability-of-a-word.git
cd probability-of-a-word
pip install -e .

Dependencies

WordsProbability has the following requirements:

Usage

Basic Usage

Install this repository. Then run:

$ wordsprobability --model pythia-70m --input examples/abstract.txt --output temp.tsv

The input must be a txt file, with one sequence per line. The output will be a tsv file with a word per row with its respective computed surprisal values. To also get computed surprisal_buggy values (without our paper's correction) use the optional flag --return-buggy-surprisals. Currently, supported models are: pythia-70m, pythia-160m, pythia-410m, pythia-14b, pythia-28b, pythia-69b, pythia-120b, gpt2-small, gpt2-medium, gpt2-large, gpt2-xl. The code

Using in other Applications

Import wordsprobability in your application and get surprisals with:

    from wordsprobability import get_surprisal_per_word
    df = get_surprisal_per_word(text='Hello world! Who are you???\nWho am I?', model_name='pythia-70m')

Extra Information

Citation

If this code or the paper were usefull to you, consider citing it:

@article{pimentel-etal-2024-howto,
    title = "How to Compute the Probability of a Word",
    author = "Pimentel, Tiago and
    Meister, Clara",
    year = "2024",
    eprint = {2406.14561},
    archivePrefix = {arXiv},
    primaryClass = {cs.CL},
    url = {https://arxiv.org/abs/2406.14561},
    journal = "arXiv preprint arXiv:2406.14561",
}

Contact

To ask questions or report problems, please open an issue.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wordsprobability-0.17.tar.gz (8.0 kB view hashes)

Uploaded Source

Built Distribution

wordsprobability-0.17-py3-none-any.whl (8.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page