Skip to main content

A package for extracting word representations from BERT/XLNet

Project description

Embedding4BERT

Stable version Python3wheel:embedding4bert Download MIT License

Table of Contents

This is a Python library of extracting word embeddings from pre-trained language models.

User Guide

Installation

pip install --upgrade embedding4bert

Usage

Extract word embeddings of pretrained BERT models.

  • Sum the representations of the last four layers.
  • Take the mean of the representation of subword pieces as the word representations.
  1. Extract BERT word embeddings.
from embedding4bert import Embedding4BERT
emb4bert = Embedding4BERT("bert-base-cased") # bert-base-uncased
tokens, embeddings = emb4bert.extract_word_embeddings('This is a python library for extracting word representations from BERT.')
print(tokens)
print(embeddings.shape)

Expected output:

14 tokens: [CLS] This is a python library for extracting word representations from BERT. [SEP], 19 word-tokens: ['[CLS]', 'This', 'is', 'a', 'p', '##yt', '##hon', 'library', 'for', 'extract', '##ing', 'word', 'representations', 'from', 'B', '##ER', '##T', '.', '[SEP]']
['[CLS]', 'This', 'is', 'a', 'python', 'library', 'for', 'extracting', 'word', 'representations', 'from', 'BERT', '.', '[SEP]']
(14, 768)
  1. Extract XLNet word embeddings.
from embedding4bert import Embedding4BERT
emb4bert = Embedding4BERT("xlnet-base-cased")
tokens, embeddings = emb4bert.extract_word_embeddings('This is a python library for extracting word representations from BERT.')
print(tokens)
print(embeddings.shape)

Expected output:

11 tokens: This is a python library for extracting word representations from BERT., 16 word-tokens: ['▁This', '▁is', '▁a', '▁', 'py', 'thon', '▁library', '▁for', '▁extract', 'ing', '▁word', '▁representations', '▁from', '▁B', 'ERT', '.']
['▁This', '▁is', '▁a', '▁python', '▁library', '▁for', '▁extracting', '▁word', '▁representations', '▁from', '▁BERT.']
(11, 768)

Citation

For attribution in academic contexts, please cite this work as:

@misc{chai2020-embedding4bert,
  author = {Chai, Yekun},
  title = {embedding4bert: A python library for extracting word embeddings from pre-trained language models},
  year = {2020},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/cyk1337/embedding4bert}}
}

References

  1. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
  2. XLNet: Generalized Autoregressive Pretraining for Language Understanding

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embedding4bert-0.0.4.tar.gz (4.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

embedding4bert-0.0.4-py3-none-any.whl (8.2 kB view details)

Uploaded Python 3

File details

Details for the file embedding4bert-0.0.4.tar.gz.

File metadata

  • Download URL: embedding4bert-0.0.4.tar.gz
  • Upload date:
  • Size: 4.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.11

File hashes

Hashes for embedding4bert-0.0.4.tar.gz
Algorithm Hash digest
SHA256 4a78709f2be0fef5092830dd7eeb03a2f891b08ad5ba56c9bff1e98c50f05093
MD5 0807167d1ad7e27672420f1998c9bc7e
BLAKE2b-256 5c4781d67ab6084a3d468706b36e6dc12f42a167d20ded7c78a2769d48ceaba3

See more details on using hashes here.

File details

Details for the file embedding4bert-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: embedding4bert-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 8.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.11

File hashes

Hashes for embedding4bert-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 fc8296038e29a6899474314a6d786b04a6645abf7e08c6c2547de28052ffc752
MD5 2d55c48f9de32be50579c391a338f6b8
BLAKE2b-256 0ce1a1288b5a4c0445fbb389c26d56bfd8b5e86257cc3a32dec6330d65c6677f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page