Skip to main content

FlashBertTokenizer implementation with C++ backend

Project description

flash-tokenizer

Flash BERT tokenizer implementation with C++ backend.

Installation

pip install flash-tokenizer
git clone https://github.com/springkim/flash-tokenizer.git
cd flash-tokenizer
pip install .

Usage

from flash_tokenizer import FlashBertTokenizer
tokenizer = FlashBertTokenizer("path/to/vocab.txt", do_lower_case=True)
# Tokenize text
ids = tokenizer("Hello, world!")
print(ids)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flash_tokenizer-0.3.0.tar.gz (7.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flash_tokenizer-0.3.0-cp312-cp312-macosx_15_0_arm64.whl (2.3 kB view details)

Uploaded CPython 3.12macOS 15.0+ ARM64

File details

Details for the file flash_tokenizer-0.3.0.tar.gz.

File metadata

  • Download URL: flash_tokenizer-0.3.0.tar.gz
  • Upload date:
  • Size: 7.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for flash_tokenizer-0.3.0.tar.gz
Algorithm Hash digest
SHA256 d27fae7846df589e900c07eceaf895b71f834e11f8bc18cc2603e1000e878905
MD5 eb29c2ff9a7fdd80d090150a41771c52
BLAKE2b-256 fe9c7452703fb6075816e4026dc2e397af32acfb58a419447a4d1ffa968602b8

See more details on using hashes here.

File details

Details for the file flash_tokenizer-0.3.0-cp312-cp312-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for flash_tokenizer-0.3.0-cp312-cp312-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 4a2a1541236a4b85a2d5173227cb192161d383f63e905f2a58cc7495b3a82628
MD5 85a5839b798409b3d634e7a8bae090d1
BLAKE2b-256 6e81811e5e0b61d263fc5ef5c2e0e1921e304f8e2e4c3ce745e17ebc4a8d7490

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page