Skip to main content

A simple Tokenizer

Project description

Usage Sample ''''''''''''

.. code:: python

    from xtokenizer import Tokenizer

    tokenizer = Tokenizer.from_texts(texts, min_freq=5)
    sent = 'I love you'
    tokens = tokenizer.encode(sent, max_length=6)
    # [101, 66, 88, 99, 102, 0]
    sent = tokenizer.decode(tokens)
    # ['<BOS>', 'I', 'love', 'you', '<EOS>', '<PAD>']

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xtokenizer-0.0.1.tar.gz (19.3 kB view details)

Uploaded Source

File details

Details for the file xtokenizer-0.0.1.tar.gz.

File metadata

  • Download URL: xtokenizer-0.0.1.tar.gz
  • Upload date:
  • Size: 19.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.18

File hashes

Hashes for xtokenizer-0.0.1.tar.gz
Algorithm Hash digest
SHA256 ad494eafa20d597c6170bc0bd5a3a01f8b99f44bfa4efce3061e140a68694030
MD5 1570afdb6e8c9bf4823465181768989e
BLAKE2b-256 244ac0824628503ddbbe0d6b6885ac07c105537fbc1b55f7eb4a496151e3c553

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page