Skip to main content

Tokenizers that operate on integer sequences.

Project description

Unit Tokenizer

pytest

Tokenizers that operate on integer sequences.

Requirements

  • Python >= 3.9 (because of type hinting syntax)

Features

  • BPETokenizer

    • Byte-pair encoding algorithm
  • RLETokenizer

    • Run-length encoding algorithm
  • PackBitsTokenizer

    • Modified run-length encoding algorithm
  • NaivePackBitsTokenizer

    • PackBitsTokenizer that allows negative units

Installation

pip install unit-tokenizer

Installation for development

poetry install
pre-commit install

Test

poetry run pytest

Usage

See tests/*.py.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unit_tokenizer-0.1.4.tar.gz (8.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

unit_tokenizer-0.1.4-py3-none-any.whl (12.4 kB view details)

Uploaded Python 3

File details

Details for the file unit_tokenizer-0.1.4.tar.gz.

File metadata

  • Download URL: unit_tokenizer-0.1.4.tar.gz
  • Upload date:
  • Size: 8.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for unit_tokenizer-0.1.4.tar.gz
Algorithm Hash digest
SHA256 312687ab9a16453d4acce19e70dde3a8604a79c724fcede3d1b8ba4bc0f9a1f6
MD5 8487083cfb85bc265036e818536b0e3c
BLAKE2b-256 618001a75ce211faa17479ed35a75b178a5e54045eee07104a6bd0e0e649f411

See more details on using hashes here.

File details

Details for the file unit_tokenizer-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: unit_tokenizer-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 12.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for unit_tokenizer-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 cf7a7a8f0adbf9d39ba3553fcc3824bc18b93bdfcdbbba9e62a599f7a0ddec25
MD5 028686aa02517f73fa1c99bf1dd5084d
BLAKE2b-256 f2fdf71e28834c8e98b943cea1db57d939a105970224a7d376277f813163a8e0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page