Skip to main content

No project description provided

Project description

PyPI version License: MIT Downloads

Tokeniser

Tokeniser is a lightweight Python package designed for simple and efficient token counting in text. It uses regular expressions to identify tokens, providing a straightforward approach to tokenization without relying on complex NLP models.

Installation

To install Tokeniser, you can use pip:

pip install tokeniser

Usage

Tokeniser is easy to use in your Python scripts. Here's a basic example:

import tokeniser

text = "Hello, World!"
token_count = tokeniser.estimate_tokens(text)
print(f"Number of tokens: {token_count}")

This package is ideal for scenarios where a simple token count is needed, without the overhead of more complex NLP tools.

Features

  • Simple and efficient token counting using regular expressions.
  • Lightweight with no dependencies on large NLP models or frameworks.
  • Versatile for use in various text processing tasks.

Contributing

Contributions, issues, and feature requests are welcome! Feel free to check the issues page.

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokeniser-0.0.2.tar.gz (2.8 kB view hashes)

Uploaded Source

Built Distribution

tokeniser-0.0.2-py3-none-any.whl (3.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page