Inclitoken is implementation of Byte Pair Encoding Tokenizer from scratch.
Project description
IncliToken
A simple Byte Pair Encoding (BPE) tokenizer implementation from scratch in Python.
Installation
uv add inclitoken
Or you can use pip:
pip install inclitoken
Usage
from inclitoken.tokenizer import BPETokenizer
# Initialize tokenizer
tokenizer = BPETokenizer()
# Train on your text
text = "Hello world! This is a simple example."
tokenizer.train(text, turns=100, verbose=False)
# Encode text to token IDs
ids = tokenizer.encode("Hello world!")
print(ids)
# Decode token IDs back to text
decoded = tokenizer.decode(ids)
print(decoded)
Features
- Train custom BPE tokenizers on your text
- Encode text into token IDs
- Decode token IDs back into text
- Track merge operations and vocabulary
Requirements
- Python >= 3.14
- tqdm
Author
Built by Adarsh Dubey
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
inclitoken-0.1.0.tar.gz
(3.2 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file inclitoken-0.1.0.tar.gz.
File metadata
- Download URL: inclitoken-0.1.0.tar.gz
- Upload date:
- Size: 3.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
044b61399c8c727726dc4e01c4b8a6cfe3a3dc9c95ec9bb582a9892c5842c070
|
|
| MD5 |
96d37070845f8965826c12bfde5d8639
|
|
| BLAKE2b-256 |
c659591c182d271f177727ef38993abbf6cfbe9bb0c12eac6d61dfae02c68582
|
File details
Details for the file inclitoken-0.1.0-py3-none-any.whl.
File metadata
- Download URL: inclitoken-0.1.0-py3-none-any.whl
- Upload date:
- Size: 4.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
930b64db0ebe9087fd8d6713524de46b6e243c5f79cc1e6b51862a648ed45144
|
|
| MD5 |
d890dca4085294fa73a0eb2ed8768709
|
|
| BLAKE2b-256 |
c2fec9242cfe53715ea14a8dac9e6e53723143f0086633bb7803ff2fa9d52e7f
|