Tokenize for subword
Project description
Genz Tokenize
install via pip (from PyPI):
pip install genz-tokenize
Using
from genz_tokenize import Tokenize
tokenize = Tokenize('vocab.txt', 'bpe.codes')
print(tokenize(['sinh_viên công_nghệ', 'hello'], maxlen = 10))
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
genz-tokenize-1.0.3b0.tar.gz
(411.6 kB
view details)
Built Distribution
File details
Details for the file genz-tokenize-1.0.3b0.tar.gz
.
File metadata
- Download URL: genz-tokenize-1.0.3b0.tar.gz
- Upload date:
- Size: 411.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1849ef34425b31410f2d9680f6e43730aee739c2a7875e2f474ba1a134308bd4 |
|
MD5 | b984ca217cb59046d14608dc98728f87 |
|
BLAKE2b-256 | 18ba689152765ae38e1fc29d1c3bd160f76497b50e4d92a3a0394205fe0d00cb |
File details
Details for the file genz_tokenize-1.0.3b0-py3-none-any.whl
.
File metadata
- Download URL: genz_tokenize-1.0.3b0-py3-none-any.whl
- Upload date:
- Size: 413.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | de706c4b66d202afbcff2110d6d44db8290c80453204b3eabd0368c567b3f94c |
|
MD5 | 4472f01e3642307f1017915d383342b4 |
|
BLAKE2b-256 | cf30bd78b1fd6afa6ac4d25c6aeb325692e188d24b62d966711bd7405e8ec07b |