Skip to main content

A package of useful functions to analyze transformer based language models.

Project description

minicons

Downloads

Helper functions for analyzing Transformer based representations of language

This repo is a wrapper around the transformers library from hugging face :hugs:

Installation

Install from Pypi using:

pip install minicons

Supported Functionality

  • Extract word representations from Contextualized Word Embeddings
  • Score sequences using language model scoring techniques, including masked language models following Salazar et al. (2020).

Examples

  1. Extract word representations from contextualized word embeddings:
from minicons import cwe

model = cwe.CWE('bert-base-uncased')

context_words = [("I went to the bank to withdraw money.", "bank"), 
                 ("i was at the bank of the river ganga!", "bank")]

print(model.extract_representation(context_words, layer = 12))

''' 
tensor([[ 0.5399, -0.2461, -0.0968,  ..., -0.4670, -0.5312, -0.0549],
        [-0.8258, -0.4308,  0.2744,  ..., -0.5987, -0.6984,  0.2087]],
       grad_fn=<MeanBackward1>)
'''
  1. Compute sentence acceptability measures (surprisals) using Incremental and Masked Language Models:
from minicons import scorer

mlm_model = scorer.MaskedLMScorer('bert-base-uncased', 'cpu')
ilm_model = scorer.IncrementalLMScorer('distilgpt2', 'cpu')

stimuli = ["The keys to the cabinet are on the table.",
           "The keys to the cabinet is on the table."]

print(mlm_model.score(stimuli, pool = torch.sum))

'''
[13.962650299072266, 23.41507911682129]
'''

print(ilm_model.score(stimuli, pool = torch.sum))

'''
[41.51601982116699, 44.497480392456055]
'''

Tutorials

Upcoming features:

  • Explore attention distributions extracted from transformers.
  • Contextual cosine similarities, i.e., compute a word's cosine similarity with every other word in the input context with batched computation.
  • Open to suggestions!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

minicons-0.1.4.tar.gz (12.0 kB view hashes)

Uploaded Source

Built Distribution

minicons-0.1.4-py3-none-any.whl (12.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page