Skip to main content

Fast python whitespace tokenizer wtitten in cython.

Project description

whitespacetokenizer

Fast python whitespace tokenizer written in cython that also gives start and end character positions of tokens.

Installation

pip install whitespacetokenizer

Usage

from whitespacetokenizer import whitespace_tokenizer

text = "Hello, world! How are you?"
tokens = whitespace_tokenizer(text)

print(tokens)
# [("Hello,", 0, 6), ("world!", 7, 13), ("How", 14, 17), ("are", 18, 21), ("you?", 22, 26)]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whitespacetokenizer-1.0.4.tar.gz (75.5 kB view details)

Uploaded Source

File details

Details for the file whitespacetokenizer-1.0.4.tar.gz.

File metadata

  • Download URL: whitespacetokenizer-1.0.4.tar.gz
  • Upload date:
  • Size: 75.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for whitespacetokenizer-1.0.4.tar.gz
Algorithm Hash digest
SHA256 f08d3cfffa9f64b87f0d47b2a4aa702c5f634c1fa1836e6dea576aa6190f37c8
MD5 bab4b84057984f27dc123cd3062576c7
BLAKE2b-256 ffeac6137871a6b3dcbe921b1c5fec68545e0fbd6f687400280e4906a7081d8e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page