Skip to main content

Lightweight tokenizer for qwen.

Project description

Qwen Tokenizer

Introduction

Qwen Tokenizer is an efficient and lightweight tokenization libraries which doesn't require heavy dependencies like the transformers library, Qwen Tokenizer solely relies on the tokenizers and regex library, making it a more streamlined and efficient choice for tokenization tasks.

Installation

To install Qwen Tokenizer, use the following command:

pip install qwen_tokenizer

Basic Usage

Below is a simple example demonstrating how to use Qwen Tokenizer to encode text:

from qwen_tokenizer import qwen_tokenizer

# Sample text
text = "Hello! 毕老师!1 + 1 = 2 ĠÑĤвÑĬÑĢ"

# Encode text
result = qwen_tokenizer.encode(text)

# Print result
print(result)

Output

[9707, 0, 6567, 107, 243, 101049, 6313, 16, 488, 220, 16, 284, 220, 17, 9843, 254, 71354, 147667, 95199, 29456, 71354, 149472, 71354, 144806]

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qwen_tokenizer-0.2.0.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

qwen_tokenizer-0.2.0-py3-none-any.whl (1.1 MB view details)

Uploaded Python 3

File details

Details for the file qwen_tokenizer-0.2.0.tar.gz.

File metadata

  • Download URL: qwen_tokenizer-0.2.0.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.19.2 CPython/3.11.6 Windows/10

File hashes

Hashes for qwen_tokenizer-0.2.0.tar.gz
Algorithm Hash digest
SHA256 979e36481306095e026e3e69f3686a782b00702f2ceff6aee32e8f2e3804191c
MD5 7b9cf6d65b9968aa91375c1d68fc559e
BLAKE2b-256 4e4decc7a05269d6aab52be5dcea93bc2f7a738fe76660d1b610fa13cf64640f

See more details on using hashes here.

File details

Details for the file qwen_tokenizer-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: qwen_tokenizer-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.19.2 CPython/3.11.6 Windows/10

File hashes

Hashes for qwen_tokenizer-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 30518b3a5e6c2427d71d5b5372b44b9daeaffffb94dfc6ea5a7d202f48641eae
MD5 58bd0e3f77610b59531d50ab7afa01d9
BLAKE2b-256 685cb8444b962933004316030702ffaa3cd7b551b640b3c3c8f00bc372649451

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page