Fast text tokenizer built with Rust and PyO3
Project description
Simple Tokenizer
Fast text tokenizer built with Rust and PyO3.
Features
word_tokenizer: Extract words from text using regexsentence_tokenizer: Split text into sentences
Installation
pip install simple_tokenizer
Usage
import simple_tokenizer
text = "Hello world! This is a test."
tokens, count, elapsed = simple_tokenizer.word_tokenizer(text)
print(tokens)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
simple_tokenizer-0.1.0.tar.gz
(10.9 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file simple_tokenizer-0.1.0.tar.gz.
File metadata
- Download URL: simple_tokenizer-0.1.0.tar.gz
- Upload date:
- Size: 10.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
71c03b3af5fde00c1d904fcb9487778fde773c41aa6bd1337d79d2bef7283fb2
|
|
| MD5 |
8b06264660aef2f7a48fb59f452af892
|
|
| BLAKE2b-256 |
4b3183a97a637fc23cfd4257fc8f06782a21c78522e26ff4ec2d5400a7c95a8d
|
File details
Details for the file simple_tokenizer-0.1.0-cp314-cp314-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: simple_tokenizer-0.1.0-cp314-cp314-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 818.1 kB
- Tags: CPython 3.14, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5ba8dcdcf8e8bcf81997f1608bc567a7538e223836e869a2328f16745fa17e80
|
|
| MD5 |
827dc2c985bbd9c60e11a985c419c067
|
|
| BLAKE2b-256 |
08174d8559dd1e760ee3ac2f4dd66491e89b1ea56450b9a9e3b466b675ff8884
|