Skip to main content

A simple Tokenizer

Project description

Usage Sample ''''''''''''

.. code:: python

    from xtokenizer import Tokenizer

    tokenizer = Tokenizer.from_texts(texts, min_freq=5)
    sent = 'I love you'
    tokens = tokenizer.encode(sent, max_length=6)
    # [101, 66, 88, 99, 102, 0]
    sent = tokenizer.decode(tokens)
    # ['<BOS>', 'I', 'love', 'you', '<EOS>', '<PAD>']

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xtokenizer-0.0.2.tar.gz (19.3 kB view details)

Uploaded Source

File details

Details for the file xtokenizer-0.0.2.tar.gz.

File metadata

  • Download URL: xtokenizer-0.0.2.tar.gz
  • Upload date:
  • Size: 19.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.18

File hashes

Hashes for xtokenizer-0.0.2.tar.gz
Algorithm Hash digest
SHA256 5cbf42b8d75808fedc552dde3885170c7bfe88742169620f398e8a795e7023bc
MD5 b08f8019f9ba9df18b5303f757310481
BLAKE2b-256 7b2cbf0f6a47a28b27d8df72e18f92cf7a4fa5c02f15a86b4e9ca895b20dfdc8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page