Skip to main content

Tokenize for subword

Project description

Genz Tokenize

Github

install via pip (from PyPI):

pip install genz-tokenize

Using

from genz_tokenize import Tokenize

tokenize = Tokenize('vocab.txt', 'bpe.codes')

print(tokenize(['sinh_viên công_nghệ', 'hello'], maxlen = 10))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genz-tokenize-1.0.3b0.tar.gz (411.6 kB view details)

Uploaded Source

Built Distribution

genz_tokenize-1.0.3b0-py3-none-any.whl (413.8 kB view details)

Uploaded Python 3

File details

Details for the file genz-tokenize-1.0.3b0.tar.gz.

File metadata

  • Download URL: genz-tokenize-1.0.3b0.tar.gz
  • Upload date:
  • Size: 411.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.9

File hashes

Hashes for genz-tokenize-1.0.3b0.tar.gz
Algorithm Hash digest
SHA256 1849ef34425b31410f2d9680f6e43730aee739c2a7875e2f474ba1a134308bd4
MD5 b984ca217cb59046d14608dc98728f87
BLAKE2b-256 18ba689152765ae38e1fc29d1c3bd160f76497b50e4d92a3a0394205fe0d00cb

See more details on using hashes here.

File details

Details for the file genz_tokenize-1.0.3b0-py3-none-any.whl.

File metadata

  • Download URL: genz_tokenize-1.0.3b0-py3-none-any.whl
  • Upload date:
  • Size: 413.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.9

File hashes

Hashes for genz_tokenize-1.0.3b0-py3-none-any.whl
Algorithm Hash digest
SHA256 de706c4b66d202afbcff2110d6d44db8290c80453204b3eabd0368c567b3f94c
MD5 4472f01e3642307f1017915d383342b4
BLAKE2b-256 cf30bd78b1fd6afa6ac4d25c6aeb325692e188d24b62d966711bd7405e8ec07b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page