Skip to main content

Byte Pair Encoding for Natural Language Processing.

Project description

coverage

test coverage

tests

test results

Byte Pair Encoding for Natural Language Processing.

Installation

Install with pip:

pip install bpelib

Uninstalling

Uninstall with pip:

pip uninstall bpelib

Usage

Import the BPE class.

from bpelib import bpe

Learn encoding on construct or at a later time:

bpe = BPE(['start', 'learning', 'now'])
# or ...
bpe.learn_encoding(['start', 'learning', 'now'])

To encode or decode a word, simply call the BPE object.

encoded = bpe('encode')  # '<w/> e n c o d e </w>'
decoded = bpe(encoded)  # 'encode'
assert 'encode' == decoded

You can call encode or decode explicitly, too.

encoded = bpe.encode('encode')  # '<w/> e n c o d e </w>'
decoded = bpe.decode(encoded)  # 'encode'
assert 'encode' == decoded

You can also specify maximum vocabulary size and the used encoding.

bpe = BPE(max_vocab_size=1024, encoding='ascii')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bpelib-v0.1.3.tar.gz (1.6 MB view details)

Uploaded Source

File details

Details for the file bpelib-v0.1.3.tar.gz.

File metadata

  • Download URL: bpelib-v0.1.3.tar.gz
  • Upload date:
  • Size: 1.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.16

File hashes

Hashes for bpelib-v0.1.3.tar.gz
Algorithm Hash digest
SHA256 751d1ff98fa76b3beb7524425d33065c1eda1102cb127669c2b81b600df95aac
MD5 5d1866742985f59c1d5c0c6fa6603e8e
BLAKE2b-256 1d7d2304ee53cb4005420f5a328fc111d3749b034b130b70e3abcea9029ab164

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page