Skip to main content

A package of useful functions to analyze transformer based language models.

Project description

minicons

Downloads

Helper functions for analyzing Transformer based representations of language

This repo is a wrapper around the transformers library from hugging face :hugs:

Installation

Install from Pypi using:

pip install minicons

Supported Functionality

  • Extract word representations from Contextualized Word Embeddings
  • Score sequences using language model scoring techniques, including masked language models following Salazar et al. (2020).

Examples

  1. Extract word representations from contextualized word embeddings:
from minicons import cwe

model = cwe.CWE('bert-base-uncased')

context_words = [("I went to the bank to withdraw money.", "bank"), 
                 ("i was at the bank of the river ganga!", "bank")]

print(model.extract_representation(context_words, layer = 12))

''' 
tensor([[ 0.5399, -0.2461, -0.0968,  ..., -0.4670, -0.5312, -0.0549],
        [-0.8258, -0.4308,  0.2744,  ..., -0.5987, -0.6984,  0.2087]],
       grad_fn=<MeanBackward1>)
'''
  1. Compute sentence acceptability measures (surprisals) using Word Prediction Models:
from minicons import scorer

mlm_model = scorer.MaskedLMScorer('bert-base-uncased', 'cpu')
ilm_model = scorer.IncrementalLMScorer('distilgpt2', 'cpu')

stimuli = ["The keys to the cabinet are on the table.",
           "The keys to the cabinet is on the table."]

# use sequence_score with different reduction options: 
# Sequence Surprisal - lambda x: -x.sum(1)
# Sequence Log-probability - lambda x: x.sum(1)
# Sequence Surprisal, normalized by number of tokens - lambda x: -x.mean(1)
# Sequence Log-probability, normalized by number of tokens - lambda x: x.mean(1)
# and so on...

print(ilm_model.sequence_score(stimuli, reduction = lambda x: -x.sum(0).item()))

'''
[39.879737854003906, 42.75846481323242]
'''

# MLM scoring, inspired by Salazar et al., 2020
print(mlm_model.sequence_score(stimuli, reduction = lambda x: -x.sum(0).item()))
'''
[13.962685585021973, 23.415111541748047]
'''

Tutorials

Recent Updates

  • November 6, 2021: MLM scoring has been fixed! You can now use model.token_score() and model.sequence_score() with MaskedLMScorers as well!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

minicons-0.2.1.tar.gz (19.3 kB view hashes)

Uploaded Source

Built Distribution

minicons-0.2.1-py3-none-any.whl (20.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page