Skip to main content

Tokenizers that operate on integer sequences.

Project description

Unit Tokenizer

pytest

Tokenizers that operate on integer sequences.

Requirements

  • Python >= 3.9 (because of type hinting syntax)

Features

  • BPETokenizer

    • Byte-pair encoding algorithm
  • RLETokenizer

    • Run-length encoding algorithm
  • PackBitsTokenizer

    • Modified run-length encoding algorithm
  • NaivePackBitsTokenizer

    • PackBitsTokenizer that allows negative units

Installation

pip install unit-tokenizer

Installation for development

poetry install
pre-commit install

Test

poetry run pytest

Usage

See tests/*.py.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unit_tokenizer-0.1.2.tar.gz (8.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

unit_tokenizer-0.1.2-py3-none-any.whl (12.2 kB view details)

Uploaded Python 3

File details

Details for the file unit_tokenizer-0.1.2.tar.gz.

File metadata

  • Download URL: unit_tokenizer-0.1.2.tar.gz
  • Upload date:
  • Size: 8.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for unit_tokenizer-0.1.2.tar.gz
Algorithm Hash digest
SHA256 2914ca7997e7518ffeb17d0c6d090c4d04777a68ff863467803c78425a32a370
MD5 0ecc578813e0c3b2e1aadef0e8775525
BLAKE2b-256 7b4bb7bb1de94037ee3ab7ad47320129d6c47566172093e4230f0d2308da0cb1

See more details on using hashes here.

File details

Details for the file unit_tokenizer-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: unit_tokenizer-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 12.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for unit_tokenizer-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 602dbaf163e18a9baf544b7963e594f65e2687ca928494dc0358e59f4fa2add5
MD5 28aa7cf331e630297e5a587d92e7fdca
BLAKE2b-256 3f2b57c564283751101cf78d2326e9e90a548192c74fc343f5cbc58f49c395d8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page