Skip to main content

A Python implementation mimicking C's `strtok()` behavior for generic sequences, but without global state.

Project description

Sequence Tokenizer (seqtok)

A Python implementation mimicking C's strtok() behavior for generic sequences, but without global state.

Features

  • Splits any sequence type (lists, tuples, strings) using separator elements
  • Memory efficient (yields tokens one at a time)
  • Follows C's strtok() conventions:
    • Skips leading/trailing separators
    • Treats consecutive separators as single delimiter
    • Never returns empty tokens
  • But with crucial differences:
    • State is encapsulated in the generator instance (no global state)
      • No thread safety concerns from global state
    • Each iterator maintains independent state (safe for separate instances)
      • Multiple tokenizers can operate simultaneously
    • Original sequence is never modified
    • Immutable tokens via COWList (Copy-On-Write List)

Installation

pip install seqtok

Examples

from seqtok import seqtok

# Tokenize a list of numbers
numbers = [1, 2, 0, 3, 4, 0, 0, 5]
for token in seqtok(numbers, {0}):
    print(token)
# Output: COWList([1, 2])
#         COWList([3, 4])
#         COWList([5])

# Tokenize a string
text = "..Hello...world.!"
for token in seqtok(text, {'.', '!'}):
    print(''.join(token))
# Output: Hello
#         world

Contributing

Contributions are welcome! Please submit pull requests or open issues on the GitHub repository.

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seqtok-0.1.0a1.tar.gz (3.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

seqtok-0.1.0a1-py2.py3-none-any.whl (3.7 kB view details)

Uploaded Python 2Python 3

File details

Details for the file seqtok-0.1.0a1.tar.gz.

File metadata

  • Download URL: seqtok-0.1.0a1.tar.gz
  • Upload date:
  • Size: 3.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for seqtok-0.1.0a1.tar.gz
Algorithm Hash digest
SHA256 c51bb4f9683096c66f3f765c9ab97f828ede5d875c48df4ae2fdda1555b2c024
MD5 deebc58a8d8ff09f4ef7d9735a47654c
BLAKE2b-256 b1056564770f57a857e8bbeef1481db46f2cef2011cec80a12d3faf5e72f75cd

See more details on using hashes here.

File details

Details for the file seqtok-0.1.0a1-py2.py3-none-any.whl.

File metadata

  • Download URL: seqtok-0.1.0a1-py2.py3-none-any.whl
  • Upload date:
  • Size: 3.7 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for seqtok-0.1.0a1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 fbe5dc5a332a96f88a6944f826bf466a528f835f701a7f6e92e1e2311cd895a2
MD5 cb6e7d745d79865e5d30bc6c8984877b
BLAKE2b-256 a1dbc57efa4cc2d29f7225acc793ac3fea2254c22012398830bb060a4f038a79

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page