Skip to main content

Flash BERT tokenizer implementation with C++ backend

Project description

flash-tokenizer

Flash BERT tokenizer implementation with C++ backend.

Installation

pip install flash-tokenizer

Or install from source:

git clone https://github.com/springkim/flash-tokenizer.git
cd flash-tokenizer
pip install .

Usage

from flash_tokenizer import FlashBertTokenizer

# Initialize the tokenizer with a vocabulary file
tokenizer = FlashBertTokenizer("path/to/vocab.txt", do_lower_case=True)

# Tokenize text
tokens = tokenizer.tokenize("Hello, world!")
print(tokens)

# Convert tokens to IDs
ids = tokenizer.convert_tokens_to_ids(tokens)
print(ids)

# Or use the tokenizer directly
ids = tokenizer("Hello, world!")
print(ids)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flash_tokenizer-0.1.0.tar.gz (6.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flash_tokenizer-0.1.0-cp312-cp312-macosx_15_0_arm64.whl (71.5 kB view details)

Uploaded CPython 3.12macOS 15.0+ ARM64

File details

Details for the file flash_tokenizer-0.1.0.tar.gz.

File metadata

  • Download URL: flash_tokenizer-0.1.0.tar.gz
  • Upload date:
  • Size: 6.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for flash_tokenizer-0.1.0.tar.gz
Algorithm Hash digest
SHA256 476b48c1cf36e303bb9707f9f936b73f43c6a57dd3e085a75448f6cc8b8b8299
MD5 1b602b32d3e8ebe4f14b730e6e5ee1ca
BLAKE2b-256 58ea9f7c73ea602294ac47d3686f2c4a50d99b54b6f2d5ed3ace12b871b0d4a7

See more details on using hashes here.

File details

Details for the file flash_tokenizer-0.1.0-cp312-cp312-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for flash_tokenizer-0.1.0-cp312-cp312-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 389513801d13d297f73d8e281beee6816361b8f77960b4bcf1ed5d557902f3a9
MD5 8a6f95d5cd232cb374ac357da07947e3
BLAKE2b-256 d35124ab47e5c8ebf858925847d830bd18301237a6e0736c00660c7df8a65771

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page