Skip to main content

Lightweight tokenizer for deepseek

Project description

DeepSeek Tokenizer

English | 中文

Introduction

DeepSeek Tokenizer is an efficient and lightweight tokenization library with no third-party runtime dependencies, making it a streamlined and efficient choice for tokenization tasks.

Installation

To install DeepSeek Tokenizer, use the following command:

pip install deepseek_tokenizer

Basic Usage

Below is a simple example demonstrating how to use DeepSeek Tokenizer to encode text:

from deepseek_tokenizer import ds_token

# Sample text
text = "Hello! 毕老师!1 + 1 = 2 ĠÑĤвÑĬÑĢ"

# Encode text
result = ds_token.encode(text)

# Print result
print(result)

Output

[19923, 3, 223, 5464, 5008, 1175, 19, 940, 223, 19, 438, 223, 20, 6113, 257, 76589, 131, 100, 76032, 1628, 76589, 131, 108, 76589, 131, 98]

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deepseek_tokenizer-0.2.0.tar.gz (1.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deepseek_tokenizer-0.2.0-py3-none-any.whl (2.0 MB view details)

Uploaded Python 3

File details

Details for the file deepseek_tokenizer-0.2.0.tar.gz.

File metadata

  • Download URL: deepseek_tokenizer-0.2.0.tar.gz
  • Upload date:
  • Size: 1.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for deepseek_tokenizer-0.2.0.tar.gz
Algorithm Hash digest
SHA256 81d4afa1394fcd15bae6f22c7698e31c5f265a5198b14b13d1a9e25e6561ba63
MD5 e8970bf4b238798295af7a196328b253
BLAKE2b-256 9b81889836a2c3487e176e92cf1794ca8a15119451aa648558718f46b7a3997b

See more details on using hashes here.

Provenance

The following attestation bundles were made for deepseek_tokenizer-0.2.0.tar.gz:

Publisher: release.yml on AndersonBY/deepseek-tokenizer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file deepseek_tokenizer-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for deepseek_tokenizer-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6a914a11a8ae47d2c4d4ccb9c7cd270e90f8e7811365cbab4e575a4e027a21f6
MD5 af8499b50c59b72d13b9d54af858d571
BLAKE2b-256 beae667a44db11574401824c9b17b5af015f0e136be3c91369fe0b059524d6ee

See more details on using hashes here.

Provenance

The following attestation bundles were made for deepseek_tokenizer-0.2.0-py3-none-any.whl:

Publisher: release.yml on AndersonBY/deepseek-tokenizer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page