Byte Pair Encoding for Natural Language Processing.
Project description
coverage |
|
---|---|
tests |
Byte Pair Encoding for Natural Language Processing.
Installation
Install with pip:
pip install bpelib
Uninstalling
Uninstall with pip:
pip uninstall bpelib
Usage
Import the BPE class.
from bpelib import bpe
Learn encoding on construct or at a later time:
bpe = BPE(['start', 'learning', 'now'])
# or ...
bpe.learn_encoding(['start', 'learning', 'now'])
To encode or decode a word, simply call the BPE object.
encoded = bpe('encode') # '<w/> e n c o d e </w>'
decoded = bpe(encoded) # 'encode'
assert 'encode' == decoded
You can call encode or decode explicitly, too.
encoded = bpe.encode('encode') # '<w/> e n c o d e </w>'
decoded = bpe.decode(encoded) # 'encode'
assert 'encode' == decoded
You can also specify maximum vocabulary size and the used encoding.
bpe = BPE(max_vocab_size=1024, encoding='ascii')
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
bpelib-v0.1.3.tar.gz
(1.6 MB
view hashes)