Tokenize for subword
Project description
Genz Tokenize
install via pip (from PyPI):
pip install genz-tokenize
Using
from genz_tokenize import Tokenize
tokenize = Tokenize('vocab.txt', 'bpe.codes')
print(tokenize(['sinh_viên công_nghệ', 'hello'], maxlen = 10))
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
genz-tokenize-1.0.4.tar.gz
(411.6 kB
view details)
Built Distribution
genz_tokenize-1.0.4-py3-none-any.whl
(413.7 kB
view details)
File details
Details for the file genz-tokenize-1.0.4.tar.gz
.
File metadata
- Download URL: genz-tokenize-1.0.4.tar.gz
- Upload date:
- Size: 411.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3db381cc219a07519f31f0a78a71b9d32e32bc15c230d74b0f7ab9d00793891a |
|
MD5 | 88b51c5191dade5e8f177b6c1d0d6754 |
|
BLAKE2b-256 | 7ce3de1033ac8f9f89e90ab72f9ad31fa751258cfde8518957ce7f8c8ce8eaa0 |
File details
Details for the file genz_tokenize-1.0.4-py3-none-any.whl
.
File metadata
- Download URL: genz_tokenize-1.0.4-py3-none-any.whl
- Upload date:
- Size: 413.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6aea1679ded77a9b3ae8b827862d87f5c7b33332d27158ebbf0197dbc7427e90 |
|
MD5 | e0845b8f0e1af10906d09ec3e40329c6 |
|
BLAKE2b-256 | e302587640985870bee9807f18f1f01ff1dbab7fa8f9aa4349c531f62bb25f03 |