Skip to main content

A simple Tokenizer

Project description

Usage Sample ''''''''''''

.. code:: python

    from xtokenizer import Tokenizer

    tokenizer = Tokenizer.from_texts(texts, min_freq=5)
    sent = 'I love you'
    tokens = tokenizer.encode(sent, max_length=6)
    # [101, 66, 88, 99, 102, 0]
    sent = tokenizer.decode(tokens)
    # ['<BOS>', 'I', 'love', 'you', '<EOS>', '<PAD>']

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xtokenizer-0.0.4.tar.gz (18.5 kB view details)

Uploaded Source

File details

Details for the file xtokenizer-0.0.4.tar.gz.

File metadata

  • Download URL: xtokenizer-0.0.4.tar.gz
  • Upload date:
  • Size: 18.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.18

File hashes

Hashes for xtokenizer-0.0.4.tar.gz
Algorithm Hash digest
SHA256 b4367ceb3000397b0e89d61fbeb18c80c477f2bebf2ad08ace0aab0740cf60f5
MD5 f2be081cf269b62ab9f4a08ff7cedaf6
BLAKE2b-256 29b71913195c9b3fb3de793639395698e63a58b1ccb35526e54fe030a47c54f9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page