Skip to main content

Fast python whitespace tokenizer wtitten in cython.

Project description

whitespacetokenizer

Fast python whitespace tokenizer written in cython that also gives start and end character positions of tokens.

Installation

pip install whitespacetokenizer

Usage

from whitespacetokenizer import whitespace_tokenizer

text = "Hello, world! How are you?"
tokens = whitespace_tokenizer(text)

print(tokens)
# [("Hello,", 0, 6), ("world!", 7, 13), ("How", 14, 17), ("are", 18, 21), ("you?", 22, 26)]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whitespacetokenizer-1.0.5.tar.gz (83.9 kB view details)

Uploaded Source

File details

Details for the file whitespacetokenizer-1.0.5.tar.gz.

File metadata

  • Download URL: whitespacetokenizer-1.0.5.tar.gz
  • Upload date:
  • Size: 83.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for whitespacetokenizer-1.0.5.tar.gz
Algorithm Hash digest
SHA256 ff7ad545630fa237ca8f9a1d53733c6d01ef65cf0faee7da7cb5ca487af15377
MD5 056caf0f3b756effe4e4fc259a95cbf8
BLAKE2b-256 dd9a80efbc4830472d90f8474369606c16516abd7c7bca7cfd1daeccda1f75a7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page