Skip to main content

FlashBertTokenizer implementation with C++ backend

Project description

flash-tokenizer

Flash BERT tokenizer implementation with C++ backend.

Installation

pip install flash-tokenizer
git clone https://github.com/springkim/flash-tokenizer.git
cd flash-tokenizer
pip install .

Usage

from flash_tokenizer import FlashBertTokenizer
tokenizer = FlashBertTokenizer("path/to/vocab.txt", do_lower_case=True)
# Tokenize text
ids = tokenizer("Hello, world!")
print(ids)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flash_tokenizer-0.4.0.tar.gz (8.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flash_tokenizer-0.4.0-cp312-cp312-macosx_15_0_arm64.whl (73.9 kB view details)

Uploaded CPython 3.12macOS 15.0+ ARM64

File details

Details for the file flash_tokenizer-0.4.0.tar.gz.

File metadata

  • Download URL: flash_tokenizer-0.4.0.tar.gz
  • Upload date:
  • Size: 8.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for flash_tokenizer-0.4.0.tar.gz
Algorithm Hash digest
SHA256 0e18c4cda6fae4a4ee5057896c966bf6a5de1c2bf30f936891a7655649f5e581
MD5 3b161140c9c5aca8832689f745c101e8
BLAKE2b-256 0294a9a3a70370e52c447fa03db5d03cf3406e78fe7d3d40821e39707cddc0fa

See more details on using hashes here.

File details

Details for the file flash_tokenizer-0.4.0-cp312-cp312-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for flash_tokenizer-0.4.0-cp312-cp312-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 d95716993c7bf4885459ab416c226e6d854a47cd6c26cb9b386624957c48a35d
MD5 8866aed0132cec3246797b8e2be6bc1d
BLAKE2b-256 a8adb2ad4a658ef645a88bfc7a0765b1e5cf3f474294afa5e4bea01cf1f2f2d3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page