No project description provided
Project description
🪙 toktkn
toktkn is a BPE tokenizer implemented in rust and exposed in python using pyo3 bindings.
from toktkn import BPETokenizer, TokenizerConfig
# create new tokenizer
config = TokenizerConfig(vocab_size: 10)
bpe = BPETokenizer(config)
# build encoding rules on some corpus
bpe.train("some really interesting training data here...")
text = "rust is pretty fun 🦀"
assert bpe.decode(bpe.encode(text)) == text
# serialize to disk
bpe.save_pretrained("tokenizer.json")
del(bpe)
bpe = BPETokenizer.from_pretrained("tokenizer.json")
assert(len(bpe)==10)
Install
Install toktkn from PyPI with the following
pip install toktkn
Note: if you want to build from source make sure cargo is installed!
Performance
slightly faster than openai & a lot quicker than 🤗!
Performance measured on 2.5MB from the wikitext test split using openai's tiktoken gpt2 tokenizer with tiktoken==0.6.0 and the implementation from 🤗 tokenizers at tokenizers==0.19.1
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file toktkn-0.1.0.tar.gz.
File metadata
- Download URL: toktkn-0.1.0.tar.gz
- Upload date:
- Size: 44.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.8.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d26fe77a98f803343663c3e07735296c5f7163c1d346b89a11fade0cae5c691d
|
|
| MD5 |
009c0de44f74787e3942203a9229b2a4
|
|
| BLAKE2b-256 |
61f1385d3ad17ab828d8c1313092628ac024e5f13bd9e7aa4a043c096e82ec34
|
File details
Details for the file toktkn-0.1.0-cp310-abi3-win_amd64.whl.
File metadata
- Download URL: toktkn-0.1.0-cp310-abi3-win_amd64.whl
- Upload date:
- Size: 302.8 kB
- Tags: CPython 3.10+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.8.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
770795f50c66c0fdfc29e9ca3a572e90c1c57f68e5f33fb1564636f153eaeaf5
|
|
| MD5 |
7806fbac5c9df128849c27dfc501f22f
|
|
| BLAKE2b-256 |
cf40b0d5a31c58c9f5a8cf064a52d8416ef841e6390c4d55523e5f971dcf9e04
|
File details
Details for the file toktkn-0.1.0-cp310-abi3-win32.whl.
File metadata
- Download URL: toktkn-0.1.0-cp310-abi3-win32.whl
- Upload date:
- Size: 285.9 kB
- Tags: CPython 3.10+, Windows x86
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.8.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0e239bf1158eda52408c3b4168cd52dd50bdad234e57d9ea154536562c7e8e45
|
|
| MD5 |
65e18cdf2d151cb4628a86d4ef1b34a0
|
|
| BLAKE2b-256 |
a582175d95595c8e327b0886cb63b0a54fe4c12952834823c764a47e3db53795
|
File details
Details for the file toktkn-0.1.0-cp310-abi3-musllinux_1_2_x86_64.whl.
File metadata
- Download URL: toktkn-0.1.0-cp310-abi3-musllinux_1_2_x86_64.whl
- Upload date:
- Size: 670.0 kB
- Tags: CPython 3.10+, musllinux: musl 1.2+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.8.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
49f09ef7b00531e328ed60d750540118868a501f2c1fada26096a52a2e1f5239
|
|
| MD5 |
dbf61f3e7d7339f6e00ec60b46e4ff47
|
|
| BLAKE2b-256 |
5ebc164de1657832e1a3e4f42a4593a064b8cdc25cb3f2bdf0715d542d0d3fd9
|
File details
Details for the file toktkn-0.1.0-cp310-abi3-musllinux_1_2_i686.whl.
File metadata
- Download URL: toktkn-0.1.0-cp310-abi3-musllinux_1_2_i686.whl
- Upload date:
- Size: 689.3 kB
- Tags: CPython 3.10+, musllinux: musl 1.2+ i686
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.8.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5b2bd28f740f1076e1ed579f069b0be190cf4b2800b5fa6189a4c8e24d2b6566
|
|
| MD5 |
066b3e0d3823f72845802e7d96e10a56
|
|
| BLAKE2b-256 |
c5e2abb17bdad62efefe724e8296b57fbe08b816e3fa298e2cd3048c075247cf
|
File details
Details for the file toktkn-0.1.0-cp310-abi3-musllinux_1_2_armv7l.whl.
File metadata
- Download URL: toktkn-0.1.0-cp310-abi3-musllinux_1_2_armv7l.whl
- Upload date:
- Size: 748.6 kB
- Tags: CPython 3.10+, musllinux: musl 1.2+ ARMv7l
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.8.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
21bc9f47eaf47ed98c3b28294ec49eaf2ce29001d91ff4327a7a2f9250c1d590
|
|
| MD5 |
9f0fa800ee28d016da2b050fb327e51b
|
|
| BLAKE2b-256 |
ab308d6512ed9ed04262888cfcd13f33b524b1229dd6047a6beb10140837874a
|
File details
Details for the file toktkn-0.1.0-cp310-abi3-musllinux_1_2_aarch64.whl.
File metadata
- Download URL: toktkn-0.1.0-cp310-abi3-musllinux_1_2_aarch64.whl
- Upload date:
- Size: 663.9 kB
- Tags: CPython 3.10+, musllinux: musl 1.2+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.8.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1f102440d2d89609c480392ab57c917f1e6f695717fab16c128987cd6ea1d5cf
|
|
| MD5 |
8f7485836ff0d5c0f950cf0e65b73451
|
|
| BLAKE2b-256 |
2141528b42ae08a76063233e934babce60e1fe24d6b59231389859a367ba9f8f
|
File details
Details for the file toktkn-0.1.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: toktkn-0.1.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 499.1 kB
- Tags: CPython 3.10+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.8.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
965a9abd4486f21c3c744223264f5722e28d97951f9797e54e7c5d98b7158cdf
|
|
| MD5 |
b1139ed894745f2ddd7cfc263173dd6d
|
|
| BLAKE2b-256 |
fd711d1a35d301542971c9b4df94fae0375c788d6a4d79bc53c4e6898d06097a
|
File details
Details for the file toktkn-0.1.0-cp310-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl.
File metadata
- Download URL: toktkn-0.1.0-cp310-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl
- Upload date:
- Size: 574.6 kB
- Tags: CPython 3.10+, manylinux: glibc 2.17+ s390x
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.8.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2307a3d5fbab343f8664113e6ffbe946d58540af5bbfc8fc5a685e919e6b1a2b
|
|
| MD5 |
d8340500b503a9e77aee42f469d05898
|
|
| BLAKE2b-256 |
706627b7933d63fbd4f22f0615012c3786c1b9cb3e848e2ea59b7a166634ee67
|
File details
Details for the file toktkn-0.1.0-cp310-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.
File metadata
- Download URL: toktkn-0.1.0-cp310-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
- Upload date:
- Size: 549.8 kB
- Tags: CPython 3.10+, manylinux: glibc 2.17+ ppc64le
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.8.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
953188079011e33cc1136a309ca85005dc6037f02a6c505b526b5f67598f80b6
|
|
| MD5 |
7e85781c233b3eb43e6fb39d548ad332
|
|
| BLAKE2b-256 |
90da731ac2892ae16185d682d91e29872895c579558aad2eae90590cf573a8fb
|
File details
Details for the file toktkn-0.1.0-cp310-abi3-manylinux_2_17_i686.manylinux2014_i686.whl.
File metadata
- Download URL: toktkn-0.1.0-cp310-abi3-manylinux_2_17_i686.manylinux2014_i686.whl
- Upload date:
- Size: 518.0 kB
- Tags: CPython 3.10+, manylinux: glibc 2.17+ i686
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.8.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
70270b1abe6d5f7755f7fd5837ccf2fce3e9d97dccd9dc038f0c9fdc731d8537
|
|
| MD5 |
d0dbd22acc960f9cb5d46d50785c98ac
|
|
| BLAKE2b-256 |
c0ceaa454cf15ad6689da62c743dced17a6e355181c78808d2f28f20a23cbdcc
|
File details
Details for the file toktkn-0.1.0-cp310-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.
File metadata
- Download URL: toktkn-0.1.0-cp310-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
- Upload date:
- Size: 487.2 kB
- Tags: CPython 3.10+, manylinux: glibc 2.17+ ARMv7l
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.8.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
562fb5ad5f014d8fae3db4fa8edc1bf189ed6b12bbb064e1f11a9e695fb67111
|
|
| MD5 |
056da58d698757e94f022daee2184a07
|
|
| BLAKE2b-256 |
c1f4a7d9c228c62709aa29bbf523937588ea079d8b8940bebee019031bc49f6d
|
File details
Details for the file toktkn-0.1.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: toktkn-0.1.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 488.5 kB
- Tags: CPython 3.10+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.8.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8c3f9248d931a735b1d4fe2212a4a28f7f92e4e9ed498d06fbcce210f8cf2fb9
|
|
| MD5 |
8e1f514538ffb33e9fce01db831ead4b
|
|
| BLAKE2b-256 |
40f782c0e74bc2673d7decc780165bed5cb481dc0fb831f2e2b03a7521aa355c
|
File details
Details for the file toktkn-0.1.0-cp310-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: toktkn-0.1.0-cp310-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 432.8 kB
- Tags: CPython 3.10+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.8.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e119b49fff458da07d7cce628c8d8081ca6ec515404d0ab7cbb41e69e225e292
|
|
| MD5 |
01a734f53f7d0c4476f86f2e716a091d
|
|
| BLAKE2b-256 |
a306e4dcaede970381ced374bdb74cdcc0553888070f50f5bd1f20e22d064e64
|