Tokenize for vocab is created by subwod-nmt
Project description
Genz Tokenize
install via pip (from PyPI):
pip install genz-tokenize
Using
from genz_tokenize import Tokenize
tokenize = Tokenize('vocab.txt', 'bpe.codes')
print(tokenize(['sinh_viên công_nghệ', 'hello'], maxlen = 10))
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
genz-tokenize-1.0.2.tar.gz
(3.9 kB
view details)
Built Distribution
File details
Details for the file genz-tokenize-1.0.2.tar.gz
.
File metadata
- Download URL: genz-tokenize-1.0.2.tar.gz
- Upload date:
- Size: 3.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | eae0d81fb6a6a37e6a118a9eacb0eb169210e53b68ae85e66a746355a4a33778 |
|
MD5 | 838a46a8e5e135e14602688d42c0ef36 |
|
BLAKE2b-256 | 28bc065020f454f9f909e6f65fb1bc57c8c3e1ab06f61749dc7e51672d544f4d |
File details
Details for the file genz_tokenize-1.0.2-py3-none-any.whl
.
File metadata
- Download URL: genz_tokenize-1.0.2-py3-none-any.whl
- Upload date:
- Size: 4.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 84a2f509da2d0b36961d6d28b67de722451b840a83da51faa1323841628c81e5 |
|
MD5 | 9a0b7e02c81ee6afcd41be9113934f62 |
|
BLAKE2b-256 | 5963e156cce384db746a3df1a532bf5564a87479b66e7584c9b12c07305d15b6 |