Skip to main content

A simple Tokenizer

Project description

Usage Sample ''''''''''''

.. code:: python

    from xtokenizer import Tokenizer

    tokenizer = Tokenizer.from_texts(texts, min_freq=5)
    sent = 'I love you'
    tokens = tokenizer.encode(sent, max_length=6)
    # [101, 66, 88, 99, 102, 0]
    sent = tokenizer.decode(tokens)
    # ['<BOS>', 'I', 'love', 'you', '<EOS>', '<PAD>']

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xtokenizer-0.0.3.tar.gz (18.6 kB view details)

Uploaded Source

File details

Details for the file xtokenizer-0.0.3.tar.gz.

File metadata

  • Download URL: xtokenizer-0.0.3.tar.gz
  • Upload date:
  • Size: 18.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.18

File hashes

Hashes for xtokenizer-0.0.3.tar.gz
Algorithm Hash digest
SHA256 7dcae69db1671b08d1f347f8ce79c9ff4b5506cde3f3d1a0153a70224f28663a
MD5 c7ccde94242ecd863e9fa9dee4588e15
BLAKE2b-256 daf4a228fb95f35e56eefd4bde92c1b02f03bf115d87dc6328c5b201f598b5ae

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page