A Python implementation mimicking C's `strtok()` behavior for generic sequences, but without global state.
Project description
Sequence Tokenizer (seqtok)
A Python implementation mimicking C's strtok() behavior for generic sequences, but without global state.
Features
- Splits any sequence type (lists, tuples, strings) using separator elements
- Memory efficient (yields tokens one at a time)
- Follows C's
strtok()conventions:- Skips leading/trailing separators
- Treats consecutive separators as single delimiter
- Never returns empty tokens
- But with crucial differences:
- State is encapsulated in the generator instance (no global state)
- No thread safety concerns from global state
- Each iterator maintains independent state (safe for separate instances)
- Multiple tokenizers can operate simultaneously
- Original sequence is never modified
- Immutable tokens via COWList (Copy-On-Write List)
- State is encapsulated in the generator instance (no global state)
Installation
pip install seqtok
Examples
from seqtok import seqtok
# Tokenize a list of numbers
numbers = [1, 2, 0, 3, 4, 0, 0, 5]
for token in seqtok(numbers, {0}):
print(token)
# Output: COWList([1, 2])
# COWList([3, 4])
# COWList([5])
# Tokenize a string
text = "..Hello...world.!"
for token in seqtok(text, {'.', '!'}):
print(''.join(token))
# Output: Hello
# world
Contributing
Contributions are welcome! Please submit pull requests or open issues on the GitHub repository.
License
This project is licensed under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file seqtok-0.1.0a1.tar.gz.
File metadata
- Download URL: seqtok-0.1.0a1.tar.gz
- Upload date:
- Size: 3.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c51bb4f9683096c66f3f765c9ab97f828ede5d875c48df4ae2fdda1555b2c024
|
|
| MD5 |
deebc58a8d8ff09f4ef7d9735a47654c
|
|
| BLAKE2b-256 |
b1056564770f57a857e8bbeef1481db46f2cef2011cec80a12d3faf5e72f75cd
|
File details
Details for the file seqtok-0.1.0a1-py2.py3-none-any.whl.
File metadata
- Download URL: seqtok-0.1.0a1-py2.py3-none-any.whl
- Upload date:
- Size: 3.7 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fbe5dc5a332a96f88a6944f826bf466a528f835f701a7f6e92e1e2311cd895a2
|
|
| MD5 |
cb6e7d745d79865e5d30bc6c8984877b
|
|
| BLAKE2b-256 |
a1dbc57efa4cc2d29f7225acc793ac3fea2254c22012398830bb060a4f038a79
|