Skip to main content

Fast python whitespace tokenizer wtitten in cython.

Project description

whitespacetokenizer

Fast python whitespace tokenizer written in cython that also gives start and end character positions of tokens.

Installation

pip install whitespacetokenizer

Usage

from whitespacetokenizer import whitespace_tokenizer

text = "Hello, world! How are you?"
tokens = whitespace_tokenizer(text)

print(tokens)
# [("Hello,", 0, 6), ("world!", 7, 13), ("How", 14, 17), ("are", 18, 21), ("you?", 22, 26)]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whitespacetokenizer-1.0.0.tar.gz (45.8 kB view details)

Uploaded Source

File details

Details for the file whitespacetokenizer-1.0.0.tar.gz.

File metadata

  • Download URL: whitespacetokenizer-1.0.0.tar.gz
  • Upload date:
  • Size: 45.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for whitespacetokenizer-1.0.0.tar.gz
Algorithm Hash digest
SHA256 502da1522180302a950e92d6d30bf7c33a7c0a5c37093531f93f42809193fd08
MD5 3979505668ed6a87527cf9b1452730ed
BLAKE2b-256 b06c14629752c7a956714d610717a7f68ff17c2fd219e31baaa99cb23611dd20

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page