A package for extracting word representations from BERT/XLNet
Project description
Embedding4BERT
Table of Contents
This is a Python library of extracting word embeddings from pre-trained language models.
User Guide
Installation
pip install --upgrade embedding4bert
Usage
Extract word embeddings of pretrained BERT models.
- Sum the representations of the last four layers.
- Take the mean of the representation of subword pieces as the word representations.
- Extract BERT word embeddings.
from embedding4bert import Embedding4BERT
emb4bert = Embedding4BERT("bert-base-cased") # bert-base-uncased
tokens, embeddings = emb4bert.extract_word_embeddings('This is a python library for extracting word representations from BERT.')
print(tokens)
print(embeddings.shape)
Expected output:
14 tokens: [CLS] This is a python library for extracting word representations from BERT. [SEP], 19 word-tokens: ['[CLS]', 'This', 'is', 'a', 'p', '##yt', '##hon', 'library', 'for', 'extract', '##ing', 'word', 'representations', 'from', 'B', '##ER', '##T', '.', '[SEP]']
['[CLS]', 'This', 'is', 'a', 'python', 'library', 'for', 'extracting', 'word', 'representations', 'from', 'BERT', '.', '[SEP]']
(14, 768)
- Extract XLNet word embeddings.
from embedding4bert import Embedding4BERT
emb4bert = Embedding4BERT("xlnet-base-cased")
tokens, embeddings = emb4bert.extract_word_embeddings('This is a python library for extracting word representations from BERT.')
print(tokens)
print(embeddings.shape)
Expected output:
11 tokens: This is a python library for extracting word representations from BERT., 16 word-tokens: ['▁This', '▁is', '▁a', '▁', 'py', 'thon', '▁library', '▁for', '▁extract', 'ing', '▁word', '▁representations', '▁from', '▁B', 'ERT', '.']
['▁This', '▁is', '▁a', '▁python', '▁library', '▁for', '▁extracting', '▁word', '▁representations', '▁from', '▁BERT.']
(11, 768)
Citation
For attribution in academic contexts, please cite this work as:
@misc{chai2020-embedding4bert,
author = {Chai, Yekun},
title = {embedding4bert: A python library for extracting word embeddings from pre-trained language models},
year = {2020},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/cyk1337/embedding4bert}}
}
References
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file embedding4bert-0.0.4.tar.gz.
File metadata
- Download URL: embedding4bert-0.0.4.tar.gz
- Upload date:
- Size: 4.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4a78709f2be0fef5092830dd7eeb03a2f891b08ad5ba56c9bff1e98c50f05093
|
|
| MD5 |
0807167d1ad7e27672420f1998c9bc7e
|
|
| BLAKE2b-256 |
5c4781d67ab6084a3d468706b36e6dc12f42a167d20ded7c78a2769d48ceaba3
|
File details
Details for the file embedding4bert-0.0.4-py3-none-any.whl.
File metadata
- Download URL: embedding4bert-0.0.4-py3-none-any.whl
- Upload date:
- Size: 8.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fc8296038e29a6899474314a6d786b04a6645abf7e08c6c2547de28052ffc752
|
|
| MD5 |
2d55c48f9de32be50579c391a338f6b8
|
|
| BLAKE2b-256 |
0ce1a1288b5a4c0445fbb389c26d56bfd8b5e86257cc3a32dec6330d65c6677f
|