Skip to main content

A package of useful functions to analyze transformer based language models.

Project description

minicons

Downloads

Helper functions for analyzing Transformer based representations of language

This repo is a wrapper around the transformers library from hugging face :hugs:

Installation

Install from Pypi using:

pip install minicons

Supported Functionality

  • Extract word representations from Contextualized Word Embeddings
  • Score sequences using language model scoring techniques, including masked language models following Salazar et al. (2020).

Examples

  1. Extract word representations from contextualized word embeddings:
from minicons import cwe

model = cwe.CWE('bert-base-uncased')

context_words = [("I went to the bank to withdraw money.", "bank"), 
                 ("i was at the bank of the river ganga!", "bank")]

print(model.extract_representation(context_words, layer = 12))

''' 
tensor([[ 0.5399, -0.2461, -0.0968,  ..., -0.4670, -0.5312, -0.0549],
        [-0.8258, -0.4308,  0.2744,  ..., -0.5987, -0.6984,  0.2087]],
       grad_fn=<MeanBackward1>)
'''
  1. Compute sentence acceptability measures (surprisals) using Word Prediction Models:
from minicons import scorer

mlm_model = scorer.MaskedLMScorer('bert-base-uncased', 'cpu')
ilm_model = scorer.IncrementalLMScorer('distilgpt2', 'cpu')

stimuli = ["The keys to the cabinet are on the table.",
           "The keys to the cabinet is on the table."]

# use sequence_score with different reduction options: 
# Sequence Surprisal - lambda x: -x.sum(1)
# Sequence Log-probability - lambda x: x.sum(1)
# Sequence Surprisal, normalized by number of tokens - lambda x: -x.mean(1)
# Sequence Log-probability, normalized by number of tokens - lambda x: x.mean(1)
# and so on...

print(ilm_model.sequence_score(stimuli, reduction = lambda x: -x.sum(0).item()))

'''
[39.879737854003906, 42.75846481323242]
'''

# MLM scoring, inspired by Salazar et al., 2020
print(mlm_model.sequence_score(stimuli, reduction = lambda x: -x.sum(0).item()))
'''
[13.962685585021973, 23.415111541748047]
'''

Tutorials

Recent Updates

  • November 6, 2021: MLM scoring has been fixed! You can now use model.token_score() and model.sequence_score() with MaskedLMScorers as well!

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

minicons-0.2.1.tar.gz (19.3 kB view details)

Uploaded Source

Built Distribution

minicons-0.2.1-py3-none-any.whl (20.5 kB view details)

Uploaded Python 3

File details

Details for the file minicons-0.2.1.tar.gz.

File metadata

  • Download URL: minicons-0.2.1.tar.gz
  • Upload date:
  • Size: 19.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.11 CPython/3.8.5 Darwin/21.2.0

File hashes

Hashes for minicons-0.2.1.tar.gz
Algorithm Hash digest
SHA256 e207314b20673c25835425c1a02eee0e1c86675ebf78d5b828602d3b858df9bd
MD5 fd39dab8c8dbff1fc54b40af6c490c81
BLAKE2b-256 e2d0eafdb90354dac94b74e8bfc112c84930b54b8a3e25ce871385c5d55b122f

See more details on using hashes here.

File details

Details for the file minicons-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: minicons-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 20.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.11 CPython/3.8.5 Darwin/21.2.0

File hashes

Hashes for minicons-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 88117c0f6b605081b547810252f022f22607d5f037df547db63b98cecfe28840
MD5 8b21bfe7e197333fd1d802dda7be16f2
BLAKE2b-256 221431a1ecbbaa60258370d2ca5a7a8f9d3b2b5fc2a2429e976a4b2ce3fba5a1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page