Skip to main content

Fast text tokenizer built with Rust and PyO3

Project description

Simple Tokenizer

Fast text tokenizer built with Rust and PyO3.

Features

  • word_tokenizer: Extract words from text using regex
  • sentence_tokenizer: Split text into sentences

Installation

pip install simple_tokenizer

Usage

import simple_tokenizer

text = "Hello world! This is a test."
tokens, count, elapsed = simple_tokenizer.word_tokenizer(text)
print(tokens)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simple_tokenizer-0.1.0.tar.gz (10.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

simple_tokenizer-0.1.0-cp314-cp314-manylinux_2_34_x86_64.whl (818.1 kB view details)

Uploaded CPython 3.14manylinux: glibc 2.34+ x86-64

File details

Details for the file simple_tokenizer-0.1.0.tar.gz.

File metadata

  • Download URL: simple_tokenizer-0.1.0.tar.gz
  • Upload date:
  • Size: 10.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for simple_tokenizer-0.1.0.tar.gz
Algorithm Hash digest
SHA256 71c03b3af5fde00c1d904fcb9487778fde773c41aa6bd1337d79d2bef7283fb2
MD5 8b06264660aef2f7a48fb59f452af892
BLAKE2b-256 4b3183a97a637fc23cfd4257fc8f06782a21c78522e26ff4ec2d5400a7c95a8d

See more details on using hashes here.

File details

Details for the file simple_tokenizer-0.1.0-cp314-cp314-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for simple_tokenizer-0.1.0-cp314-cp314-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 5ba8dcdcf8e8bcf81997f1608bc567a7538e223836e869a2328f16745fa17e80
MD5 827dc2c985bbd9c60e11a985c419c067
BLAKE2b-256 08174d8559dd1e760ee3ac2f4dd66491e89b1ea56450b9a9e3b466b675ff8884

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page