A package of useful functions to analyze transformer based language models.
Project description
minicons
Helper functions for analyzing Transformer based representations of language
This repo is a wrapper around the transformers
library from hugging face :hugs:
Installation
Install from Pypi using:
pip install minicons
Supported Functionality
- Extract word representations from Contextualized Word Embeddings
- Score sequences using language model scoring techniques, including masked language models following Salazar et al. (2020).
Examples
- Extract word representations from contextualized word embeddings:
from minicons import cwe
model = cwe.CWE('bert-base-uncased')
context_words = [("I went to the bank to withdraw money.", "bank"),
("i was at the bank of the river ganga!", "bank")]
print(model.extract_representation(context_words, layer = 12))
'''
tensor([[ 0.5399, -0.2461, -0.0968, ..., -0.4670, -0.5312, -0.0549],
[-0.8258, -0.4308, 0.2744, ..., -0.5987, -0.6984, 0.2087]],
grad_fn=<MeanBackward1>)
'''
- Compute sentence acceptability measures (surprisals) using Word Prediction Models:
from minicons import scorer
mlm_model = scorer.MaskedLMScorer('bert-base-uncased', 'cpu')
ilm_model = scorer.IncrementalLMScorer('distilgpt2', 'cpu')
stimuli = ["The keys to the cabinet are on the table.",
"The keys to the cabinet is on the table."]
# use sequence_score with different reduction options:
# Sequence Surprisal - lambda x: -x.sum(1)
# Sequence Log-probability - lambda x: x.sum(1)
# Sequence Surprisal, normalized by number of tokens - lambda x: -x.mean(1)
# Sequence Log-probability, normalized by number of tokens - lambda x: x.mean(1)
# and so on...
print(ilm_model.sequence_score(stimuli, reduction = lambda x: -x.sum(0).item()))
'''
[39.879737854003906, 42.75846481323242]
'''
# MLM scoring, inspired by Salazar et al., 2020
print(mlm_model.sequence_score(stimuli, reduction = lambda x: -x.sum(0).item()))
'''
[13.962685585021973, 23.415111541748047]
'''
Tutorials
- Introduction to using LM-scoring methods using minicons
- Computing sentence and token surprisals using minicons
- Extracting word/phrase representations using minicons
Recent Updates
- November 6, 2021: MLM scoring has been fixed! You can now use
model.token_score()
andmodel.sequence_score()
withMaskedLMScorers
as well!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
minicons-0.2.1.tar.gz
(19.3 kB
view details)
Built Distribution
minicons-0.2.1-py3-none-any.whl
(20.5 kB
view details)
File details
Details for the file minicons-0.2.1.tar.gz
.
File metadata
- Download URL: minicons-0.2.1.tar.gz
- Upload date:
- Size: 19.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.11 CPython/3.8.5 Darwin/21.2.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
e207314b20673c25835425c1a02eee0e1c86675ebf78d5b828602d3b858df9bd
|
|
MD5 |
fd39dab8c8dbff1fc54b40af6c490c81
|
|
BLAKE2b-256 |
e2d0eafdb90354dac94b74e8bfc112c84930b54b8a3e25ce871385c5d55b122f
|
File details
Details for the file minicons-0.2.1-py3-none-any.whl
.
File metadata
- Download URL: minicons-0.2.1-py3-none-any.whl
- Upload date:
- Size: 20.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.11 CPython/3.8.5 Darwin/21.2.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
88117c0f6b605081b547810252f022f22607d5f037df547db63b98cecfe28840
|
|
MD5 |
8b21bfe7e197333fd1d802dda7be16f2
|
|
BLAKE2b-256 |
221431a1ecbbaa60258370d2ca5a7a8f9d3b2b5fc2a2429e976a4b2ce3fba5a1
|