Project description

Tokeniser

Tokeniser is a lightweight Python package designed for simple and efficient token counting in text. It uses regular expressions to identify tokens, providing a straightforward approach to tokenization without relying on complex NLP models.

Installation

To install Tokeniser, you can use pip:

pip install tokeniser

Usage

Tokeniser is easy to use in your Python scripts. Here's a basic example:

import tokeniser

text = "Hello, World!"
token_count = tokeniser.estimate_tokens(text)
print(f"Number of tokens: {token_count}")

This package is ideal for scenarios where a simple token count is needed, without the overhead of more complex NLP tools.

Features

Simple and efficient token counting using regular expressions.
Lightweight with no dependencies on large NLP models or frameworks.
Versatile for use in various text processing tasks.

Contributing

Contributions, issues, and feature requests are welcome! Feel free to check the issues page.

License

This project is licensed under the MIT License.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.0.3

Apr 15, 2024

This version

0.0.2

Feb 13, 2024

0.0.1

Feb 13, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokeniser-0.0.2.tar.gz (2.8 kB view hashes)

Uploaded Feb 13, 2024 Source

Built Distribution

tokeniser-0.0.2-py3-none-any.whl (3.2 kB view hashes)

Uploaded Feb 13, 2024 Python 3

Hashes for tokeniser-0.0.2.tar.gz

Hashes for tokeniser-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`d5fe1e88e76f7b3bf2eae51235e31c46499e95a002791b637cd77b6e1946d637`
MD5	`d574061a1316cc0b8d5cc958032f5bed`
BLAKE2b-256	`47db8ea53027de9e84b5a45a36c3b8c73bc2b812b7daae9f628e61fb5be32831`

Hashes for tokeniser-0.0.2-py3-none-any.whl

Hashes for tokeniser-0.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8f03e4d180a9ccaa36f6e6f2508c44f9b4e54ce2143af52d3cdf33ddcdb2a5b2`
MD5	`9dd81b86d09ccb3f045f28d5c53079e4`
BLAKE2b-256	`fcc0fcf0992911629ca086a2d335f83fe716a6321df5b6e34eb2f5ba2ec788bd`