Skip to main content

Fast python whitespace tokenizer wtitten in cython.

Project description

whitespacetokenizer

Fast python whitespace tokenizer written in cython that also gives start and end character positions of tokens.

Installation

pip install whitespacetokenizer

Usage

from whitespacetokenizer import whitespace_tokenizer

text = "Hello, world! How are you?"
tokens = whitespace_tokenizer(text)

print(tokens)
# [("Hello,", 0, 6), ("world!", 7, 13), ("How", 14, 17), ("are", 18, 21), ("you?", 22, 26)]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whitespacetokenizer-1.0.1.tar.gz (45.8 kB view details)

Uploaded Source

File details

Details for the file whitespacetokenizer-1.0.1.tar.gz.

File metadata

  • Download URL: whitespacetokenizer-1.0.1.tar.gz
  • Upload date:
  • Size: 45.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for whitespacetokenizer-1.0.1.tar.gz
Algorithm Hash digest
SHA256 19c4c65d558d59202f77ccf76b1ab8871bd48940f0eafc9a6653ea54907384a4
MD5 245ffad4a2da4d8987cdd7d7fcd551e4
BLAKE2b-256 c231412197401d876253e5113c59aff698fbc6ace57abc2a6c7f5b8a295f6a5c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page