None
Project description
pytorch-fast-elmo
Introduction
A fast ELMo implementation with features:
- Lower execution overhead. The core components are reimplemented in Libtorch in order to reduce the Python execution overhead (45% speedup).
- A more flexible design. By redesigning the workflow, the user could extend or change the ELMo behavior easily.
Benchmark
Hardware:
- CPU: i7-7800X
- GPU: 1080Ti
Options:
- Batch size: 32
- Warm up iterations: 20
- Test iterations: 1000
- Word length: [1, 20]
- Sentence length: [1, 30]
- Random seed: 10000
Item | Mean Of Durations (ms) | cumtime(synchronize)% |
---|---|---|
Fast ELMo (CUDA, no synchronize) | 31 | N/A |
AllenNLP ELMo (CUDA, no synchronize) | 56 | N/A |
Fast ELMo (CUDA, synchronize) | 47 | 26.13% |
AllenNLP ELMo (CUDA, synchronize) | 57 | 0.02% |
Fast ELMo (CPU) | 1277 | N/A |
AllenNLP ELMo (CPU) | 1453 | N/A |
Usage
Please install torch==1.0.0 first. Then, simply run this command to install.
pip install pytorch-fast-elmo
FastElmo should have the same behavior as AllenNLP’s ELMo.
from pytorch_fast_elmo import FastElmo, batch_to_char_ids options_file = '/path/to/elmo_2x4096_512_2048cnn_2xhighway_options.json' weight_file = '/path/to/elmo_2x4096_512_2048cnn_2xhighway_weights.hdf5' elmo = FastElmo(options_file, weight_file) sentences = [['First', 'sentence', '.'], ['Another', '.']] character_ids = batch_to_ids(sentences) embeddings = elmo(character_ids)
Use FastElmoWordEmbedding if you have disabled char_cnn in bilm-tf, or have exported the Char CNN representation to a weight file.
from pytorch_fast_elmo import FastElmoWordEmbedding, load_and_build_vocab2id, batch_to_word_ids options_file = '/path/to/elmo_2x4096_512_2048cnn_2xhighway_options.json' weight_file = '/path/to/elmo_2x4096_512_2048cnn_2xhighway_weights.hdf5' vocab_file = '/path/to/vocab.txt' embedding_file = '/path/to/cached_elmo_embedding.hdf5' elmo = FastElmoWordEmbedding( options_file, weight_file, # Could be omitted if the embedding weight is in `weight_file`. word_embedding_weight_file=embedding_file, ) vocab2id = load_and_build_vocab2id(vocab_file) sentences = [['First', 'sentence', '.'], ['Another', '.']] word_ids = batch_to_word_ids(sentences, vocab2id) embeddings = elmo(word_ids)
CLI commands:
# Cache the Char CNN representation. fast-elmo cache-char-cnn ./vocab.txt ./options.json ./lm_weights.hdf5 ./lm_embd.hdf5 # Export word embedding. fast-elmo export-word-embd ./vocab.txt ./no-char-cnn.hdf5 ./embd.txt
Credits
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.
History
0.1.0 (2019-01-02)
- First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pytorch_fast_elmo-0.6.12.tar.gz
(429.8 kB
view hashes)