Skip to main content

Lightweight tokenizer for qwen.

Project description

Qwen Tokenizer

Introduction

Qwen Tokenizer is an efficient and lightweight tokenization libraries which doesn't require heavy dependencies like the transformers library, Qwen Tokenizer solely relies on the tokenizers and regex library, making it a more streamlined and efficient choice for tokenization tasks.

Installation

To install Qwen Tokenizer, use the following command:

pip install qwen_tokenizer

Basic Usage

Below is a simple example demonstrating how to use Qwen Tokenizer to encode text:

from qwen_tokenizer import qwen_tokenizer

# Sample text
text = "Hello! 毕老师!1 + 1 = 2 ĠÑĤвÑĬÑĢ"

# Encode text
result = qwen_tokenizer.encode(text)

# Print result
print(result)

Output

[9707, 0, 6567, 107, 243, 101049, 6313, 16, 488, 220, 16, 284, 220, 17, 9843, 254, 71354, 147667, 95199, 29456, 71354, 149472, 71354, 144806]

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qwen_tokenizer-0.1.0.tar.gz (3.7 MB view details)

Uploaded Source

Built Distribution

qwen_tokenizer-0.1.0-py3-none-any.whl (3.8 MB view details)

Uploaded Python 3

File details

Details for the file qwen_tokenizer-0.1.0.tar.gz.

File metadata

  • Download URL: qwen_tokenizer-0.1.0.tar.gz
  • Upload date:
  • Size: 3.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.17.3 CPython/3.11.6 Windows/10

File hashes

Hashes for qwen_tokenizer-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9dd2fed7068ae30aee12d6b5a4d97ec309ce3a975a466cac8a613a86b59f9406
MD5 1ce49e763fb9954b0516d6afbb38c8e4
BLAKE2b-256 54e05ddb37e2c38a78a1df5065eda4c90cd3b7f9e211010acd2db0c17ecf042c

See more details on using hashes here.

File details

Details for the file qwen_tokenizer-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: qwen_tokenizer-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 3.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.17.3 CPython/3.11.6 Windows/10

File hashes

Hashes for qwen_tokenizer-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2974b749461e3c841713baa20753977bffeb9e2bd14c2e8532e3c38ec8e91f39
MD5 ae7855a70b5278dbb1c4261e4faf0272
BLAKE2b-256 a633252d48366b66d790d3473063c806c741d0583d9e8d848cb15864ccec5e51

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page