Skip to main content

Fast tokenizer for language models, supporting BPE, Unigram and WordPiece tokenization

Project description

kitoken

Crates.io NPM PyPI Tests & Checks

Tokenizer for language models.

Tokenize text for Llama, Gemini, GPT-5, DeepSeek, Mistral and many others; in the web, on the client and any platform.

from kitoken import Kitoken

encoder = Kitoken.from_web("hf:Qwen/Qwen3.5-9B")

tokens = encoder.encode("hello world!", True)
string = encoder.decode(tokens).decode("utf-8")

assert string == "hello world!"

Overview

Kitoken is a fast and versatile tokenizer for language models compatible with SentencePiece, HuggingFace Tokenizers, OpenAI Tiktoken and Mistral Tekken, supporting BPE, Unigram and WordPiece tokenization.

  • Fast and efficient tokenization
    Faster than most other tokenizers in both common and uncommon scenarios; see the benchmarks for comparisons with different datasets.
  • Runs in all environments
    Native in Rust and with bindings for Web, Node and Python; see kitoken.dev for a web demo.
  • Supports input and output processing
    Including unicode-aware normalization, pre-tokenization and post-processing options.
  • Compact data encoding
    Definitions are stored in an efficient binary format and without merge list.

See the main README for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kitoken-0.11.0.tar.gz (70.3 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

kitoken-0.11.0-cp310-abi3-win_amd64.whl (2.4 MB view details)

Uploaded CPython 3.10+Windows x86-64

kitoken-0.11.0-cp310-abi3-win32.whl (2.1 MB view details)

Uploaded CPython 3.10+Windows x86

kitoken-0.11.0-cp310-abi3-musllinux_1_2_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ x86-64

kitoken-0.11.0-cp310-abi3-musllinux_1_2_i686.whl (2.4 MB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ i686

kitoken-0.11.0-cp310-abi3-musllinux_1_2_armv7l.whl (2.3 MB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ ARMv7l

kitoken-0.11.0-cp310-abi3-musllinux_1_2_aarch64.whl (2.7 MB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ ARM64

kitoken-0.11.0-cp310-abi3-manylinux_2_28_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ x86-64

kitoken-0.11.0-cp310-abi3-manylinux_2_28_ppc64le.whl (2.5 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ ppc64le

kitoken-0.11.0-cp310-abi3-manylinux_2_28_i686.whl (2.5 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ i686

kitoken-0.11.0-cp310-abi3-manylinux_2_28_armv7l.whl (2.3 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ ARMv7l

kitoken-0.11.0-cp310-abi3-manylinux_2_28_aarch64.whl (2.7 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ ARM64

kitoken-0.11.0-cp310-abi3-macosx_13_0_x86_64.whl (2.7 MB view details)

Uploaded CPython 3.10+macOS 13.0+ x86-64

kitoken-0.11.0-cp310-abi3-macosx_13_0_arm64.whl (2.5 MB view details)

Uploaded CPython 3.10+macOS 13.0+ ARM64

File details

Details for the file kitoken-0.11.0.tar.gz.

File metadata

  • Download URL: kitoken-0.11.0.tar.gz
  • Upload date:
  • Size: 70.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.13.2

File hashes

Hashes for kitoken-0.11.0.tar.gz
Algorithm Hash digest
SHA256 e97b1b91fc92871ea361007eec0bee079ae857d49da0689bdac9d1cc596e9af0
MD5 c4568181f498c05f71a9c114482a1ec2
BLAKE2b-256 6c72b8bc85b00eb0739007b4c3e41556e31f104ff126581eeb838824d7ee6a6b

See more details on using hashes here.

File details

Details for the file kitoken-0.11.0-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: kitoken-0.11.0-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 2.4 MB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.13.2

File hashes

Hashes for kitoken-0.11.0-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 5ddc90bf254980822895f1f6437fd60648d4abbf4361a3e532e1c6d0a4a62611
MD5 a30c3b10d5f1a53535fe20efac677075
BLAKE2b-256 6166cf7aef64526ddfaf3f44e531cfd3492617149e23aa13fe991f7b39b695ff

See more details on using hashes here.

File details

Details for the file kitoken-0.11.0-cp310-abi3-win32.whl.

File metadata

  • Download URL: kitoken-0.11.0-cp310-abi3-win32.whl
  • Upload date:
  • Size: 2.1 MB
  • Tags: CPython 3.10+, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.13.2

File hashes

Hashes for kitoken-0.11.0-cp310-abi3-win32.whl
Algorithm Hash digest
SHA256 a03fb61dc26961b6377f8f57091b30ca191922fe4d9148845a2ad016b71ffe0f
MD5 d67783f50b6090e85adf9cbad81920f4
BLAKE2b-256 174fe21873bcb7a1b37cd113eec0304ac01c7f83f40408dfa5733aed7375d0cf

See more details on using hashes here.

File details

Details for the file kitoken-0.11.0-cp310-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for kitoken-0.11.0-cp310-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 52e65b489a5a75b86db02a14eb06783667525ba086be9020c296d8ea9ec84138
MD5 d61b5f8ef4143a645d25f22922ad0c72
BLAKE2b-256 0d090dd25708e252af51c027e89b6adb51858d5b20ca700039c808291cce051a

See more details on using hashes here.

File details

Details for the file kitoken-0.11.0-cp310-abi3-musllinux_1_2_i686.whl.

File metadata

File hashes

Hashes for kitoken-0.11.0-cp310-abi3-musllinux_1_2_i686.whl
Algorithm Hash digest
SHA256 cb8be1fc5600e2c55157ac82d2b8405a72a743f4a5b0ab6d29f7ced848d98756
MD5 c5ccb2e50a63d21cbad79b332b3c84b2
BLAKE2b-256 a5a55a63ff6203a47f932111bfb01bf6555527fdfdfd733e967e0280e2963908

See more details on using hashes here.

File details

Details for the file kitoken-0.11.0-cp310-abi3-musllinux_1_2_armv7l.whl.

File metadata

File hashes

Hashes for kitoken-0.11.0-cp310-abi3-musllinux_1_2_armv7l.whl
Algorithm Hash digest
SHA256 2800c229596b9a3cad8cea000db827d9ec1ce6052f30b7e656957f50740076df
MD5 058c7491c4ccbf24f5b536c5cdbc7124
BLAKE2b-256 2a8d1cae0a1eb40a2744dcfe0e00d7b184e069d55d21cba0bf636c371dd0dede

See more details on using hashes here.

File details

Details for the file kitoken-0.11.0-cp310-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for kitoken-0.11.0-cp310-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 606a54e37cf8c8f139ff4c19a2968f387718c15c1f2a41b54eac66db45264911
MD5 a732e85e29e1526e0628d4ba612b5957
BLAKE2b-256 9feedd511284df08f26ccb97ff38e80c289f3ba27c6a9145877e344a5d6ea414

See more details on using hashes here.

File details

Details for the file kitoken-0.11.0-cp310-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for kitoken-0.11.0-cp310-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 8b6a0ab17163e569abab9ef5e4deae843a213130b0e05db5d793d8248969b4f3
MD5 67aff173b355f8aeaa91159592a18516
BLAKE2b-256 b2ba96549d7cbf2ac8befdb595149679d2b6f75122efbf045fcc49a22a8477d0

See more details on using hashes here.

File details

Details for the file kitoken-0.11.0-cp310-abi3-manylinux_2_28_ppc64le.whl.

File metadata

File hashes

Hashes for kitoken-0.11.0-cp310-abi3-manylinux_2_28_ppc64le.whl
Algorithm Hash digest
SHA256 2e108b8616cd756be536340d50843dec8c294a82c0db7d0ea4cfffd258f5cb9d
MD5 bbb0260e0d5eb3d89084b0040e0fce8c
BLAKE2b-256 398e0c97951ae60059ecc2ad394c82b54901299f944ea29366816be19e683c7f

See more details on using hashes here.

File details

Details for the file kitoken-0.11.0-cp310-abi3-manylinux_2_28_i686.whl.

File metadata

File hashes

Hashes for kitoken-0.11.0-cp310-abi3-manylinux_2_28_i686.whl
Algorithm Hash digest
SHA256 4a0eea6b9c4a58422ce06659d8e07c1d67e08f7fb5d31de5d84e89bcc28ca8f3
MD5 dbb771da06ead9436d63a1f7f7412943
BLAKE2b-256 3576cd776711bb0522735987e642dbe19a9148df22c758ccbb8580d961501dbc

See more details on using hashes here.

File details

Details for the file kitoken-0.11.0-cp310-abi3-manylinux_2_28_armv7l.whl.

File metadata

File hashes

Hashes for kitoken-0.11.0-cp310-abi3-manylinux_2_28_armv7l.whl
Algorithm Hash digest
SHA256 04129b1236145209f65acee7efefec659a225791bc0412d28582285ee780ec7c
MD5 fd6f315571ddb3b3920677ba2cd889ae
BLAKE2b-256 75f204a3195aeb2316b1a2eab96e9c45c47a857b67ea87914931c2598feacd1c

See more details on using hashes here.

File details

Details for the file kitoken-0.11.0-cp310-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for kitoken-0.11.0-cp310-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 a19dc3f943ba5d3f16a75219a28bc1d916578cf7262099950102b12ef726d195
MD5 8e0dd45ef55f9f2f0aecc5fbc04b13ff
BLAKE2b-256 baf16d5fa5a95e8752c6e7463d79d78b3093bfca02ed5129f122081b9b019c93

See more details on using hashes here.

File details

Details for the file kitoken-0.11.0-cp310-abi3-macosx_13_0_x86_64.whl.

File metadata

File hashes

Hashes for kitoken-0.11.0-cp310-abi3-macosx_13_0_x86_64.whl
Algorithm Hash digest
SHA256 fc8ac6642f22464fdb83524a052054706670256537ea71c2675902f94cdb533b
MD5 2ec94fbdbc09c1d700d34d309114dd77
BLAKE2b-256 c213ed509af8c8d61241cbb6a0c171431295d0bc7204e93e878b72cb6bf37d5f

See more details on using hashes here.

File details

Details for the file kitoken-0.11.0-cp310-abi3-macosx_13_0_arm64.whl.

File metadata

File hashes

Hashes for kitoken-0.11.0-cp310-abi3-macosx_13_0_arm64.whl
Algorithm Hash digest
SHA256 25a3894f63eb4d8c0229a58310d40e19aa3d615afb45717b015f5cb938ea4572
MD5 781c0ef2b2f84edf14a91c7334a30ac2
BLAKE2b-256 0bf2e1790393c6b1fa5b1a7a9470f498aa03b8a2a3ef74492c96e37dcec9cd21

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page