Project description

Embedding4BERT

Stable version Python3 wheel:embedding4bert Download MIT License

This is a Python library of extracting word embeddings from pre-trained language models.

User Guide

Installation

pip install --upgrade embedding4bert

Usage

Extract word embeddings of pretrained BERT models.

Sum the representations of the last four layers.
Take the mean of the representation of subword pieces as the word representations.

Extract BERT word embeddings.

from embedding4bert import Embedding4BERT
emb4bert = Embedding4BERT("bert-base-cased") # bert-base-uncased
tokens, embeddings = emb4bert.extract_word_embeddings('This is a python library for extracting word representations from BERT.')
print(tokens)
print(embeddings.shape)

Expected output:

14 tokens: [CLS] This is a python library for extracting word representations from BERT. [SEP], 19 word-tokens: ['[CLS]', 'This', 'is', 'a', 'p', '##yt', '##hon', 'library', 'for', 'extract', '##ing', 'word', 'representations', 'from', 'B', '##ER', '##T', '.', '[SEP]']
['[CLS]', 'This', 'is', 'a', 'python', 'library', 'for', 'extracting', 'word', 'representations', 'from', 'BERT', '.', '[SEP]']
(14, 768)

Extract XLNet word embeddings.

from embedding4bert import Embedding4BERT
emb4bert = Embedding4BERT("xlnet-base-cased")
tokens, embeddings = emb4bert.extract_word_embeddings('This is a python library for extracting word representations from BERT.')
print(tokens)
print(embeddings.shape)

Expected output:

11 tokens: This is a python library for extracting word representations from BERT., 16 word-tokens: ['▁This', '▁is', '▁a', '▁', 'py', 'thon', '▁library', '▁for', '▁extract', 'ing', '▁word', '▁representations', '▁from', '▁B', 'ERT', '.']
['▁This', '▁is', '▁a', '▁python', '▁library', '▁for', '▁extracting', '▁word', '▁representations', '▁from', '▁BERT.']
(11, 768)

Citation

For attribution in academic contexts, please cite this work as:

@misc{chai2020-embedding4bert,
  author = {Chai, Yekun},
  title = {embedding4bert: A python library for extracting word embeddings from pre-trained language models},
  year = {2020},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/cyk1337/embedding4bert}}
}

References

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.0.4

Jan 10, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embedding4bert-0.0.4.tar.gz (4.6 kB view hashes)

Uploaded Jan 10, 2022 Source

Built Distribution

embedding4bert-0.0.4-py3-none-any.whl (8.2 kB view hashes)

Uploaded Jan 10, 2022 Python 3

Hashes for embedding4bert-0.0.4.tar.gz

Hashes for embedding4bert-0.0.4.tar.gz
Algorithm	Hash digest
SHA256	`4a78709f2be0fef5092830dd7eeb03a2f891b08ad5ba56c9bff1e98c50f05093`
MD5	`0807167d1ad7e27672420f1998c9bc7e`
BLAKE2b-256	`5c4781d67ab6084a3d468706b36e6dc12f42a167d20ded7c78a2769d48ceaba3`

Hashes for embedding4bert-0.0.4-py3-none-any.whl

Hashes for embedding4bert-0.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fc8296038e29a6899474314a6d786b04a6645abf7e08c6c2547de28052ffc752`
MD5	`2d55c48f9de32be50579c391a338f6b8`
BLAKE2b-256	`0ce1a1288b5a4c0445fbb389c26d56bfd8b5e86257cc3a32dec6330d65c6677f`