A Python tokenizer trained on modern web corpus
Project description
BTok
A Python multilingual tokenizer trained on modern web corpus with SentencePiece.
Install
pip install btok --upgrade
Usage
Run tests:
python tests.py
See: tests.py
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
btok-0.3.tar.gz
(9.2 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
btok-0.3-py3-none-any.whl
(10.1 kB
view details)
File details
Details for the file btok-0.3.tar.gz.
File metadata
- Download URL: btok-0.3.tar.gz
- Upload date:
- Size: 9.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.22
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d9d7731abc2936e18c145942c430dd9691f291f2042e3479da9cb2077dace5b2
|
|
| MD5 |
69b04f93693604627530e58cc40cf7e0
|
|
| BLAKE2b-256 |
97211b04ae01e7ba5a6356632c6e373cdc753fee308d40c67ba278488e7f40b6
|
File details
Details for the file btok-0.3-py3-none-any.whl.
File metadata
- Download URL: btok-0.3-py3-none-any.whl
- Upload date:
- Size: 10.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.22
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
47f7e0b277f3cef31d7487eb792972dba5d88f37716190ce3a97007d80b61c2e
|
|
| MD5 |
99a64736c94091df1096a22fb23b5135
|
|
| BLAKE2b-256 |
1607b064fa528872bde74b1306385e077d9fb81c82bff4342acdfc25f1223475
|