Skip to main content

Fast python whitespace tokenizer wtitten in cython.

Project description

whitespacetokenizer

Fast python whitespace tokenizer written in cython that also gives start and end character positions of tokens.

Installation

pip install whitespacetokenizer

Usage

from whitespacetokenizer import whitespace_tokenizer

text = "Hello, world! How are you?"
tokens = whitespace_tokenizer(text)

print(tokens)
# [("Hello,", 0, 6), ("world!", 7, 13), ("How", 14, 17), ("are", 18, 21), ("you?", 22, 26)]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whitespacetokenizer-1.0.2.tar.gz (45.8 kB view details)

Uploaded Source

File details

Details for the file whitespacetokenizer-1.0.2.tar.gz.

File metadata

  • Download URL: whitespacetokenizer-1.0.2.tar.gz
  • Upload date:
  • Size: 45.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for whitespacetokenizer-1.0.2.tar.gz
Algorithm Hash digest
SHA256 6b5ec9663569ba3d09eeb12b109f5b2c696a893423d75d43ce9059629d275fc7
MD5 bb1d8a317d6dd7e5282df4516b8fa8db
BLAKE2b-256 7eadfa31b6aca5e053050e0e27582a9e0548ffbe3c4540b19b1ae6e99b9c5249

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page