Fast python whitespace tokenizer wtitten in cython.
Project description
whitespacetokenizer
Fast python whitespace tokenizer written in cython that also gives start and end character positions of tokens.
Installation
pip install whitespacetokenizer
Usage
from whitespacetokenizer import whitespace_tokenizer
text = "Hello, world! How are you?"
tokens = whitespace_tokenizer(text)
print(tokens)
# [("Hello,", 0, 6), ("world!", 7, 13), ("How", 14, 17), ("are", 18, 21), ("you?", 22, 26)]
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
whitespacetokenizer-1.0.2.tar.gz
(45.8 kB
view details)
File details
Details for the file whitespacetokenizer-1.0.2.tar.gz.
File metadata
- Download URL: whitespacetokenizer-1.0.2.tar.gz
- Upload date:
- Size: 45.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6b5ec9663569ba3d09eeb12b109f5b2c696a893423d75d43ce9059629d275fc7
|
|
| MD5 |
bb1d8a317d6dd7e5282df4516b8fa8db
|
|
| BLAKE2b-256 |
7eadfa31b6aca5e053050e0e27582a9e0548ffbe3c4540b19b1ae6e99b9c5249
|