Skip to main content

A Python equivalent of Java's StringTokenizer with some added functionality

Project description

StrTokenizer

A Python module that mimics the functionality of the Java StringTokenizer class. This class splits a given string into tokens based on a specified delimiter and offers methods to iterate over the tokens, count them, and manipulate the tokenizer's state.

Installation

To install the StrTokenizer package globally, you can use pip. Here are the steps to install:

  1. Ensure you have pip installed on your system.
  2. Open your command line interface (CLI) and run:
pip install StrTokenizer

If you want to use it locally without installing, simply download or copy the tokenizer.py file and import it into your project.

Usage

Import the Module

If the module is installed via pip, import the class from your module:

from StrTokenizer import StrTokenizer

If the module (tokenizer.py) is downloaded from GitHub, import it like this:

from tokenizer import StrTokenizer

Creating a StrTokenizer Object

To create an instance of StrTokenizer, provide the input string, the delimiter (optional, defaults to a space " "), and whether to return the delimiters as tokens (optional, defaults to False).

# Example with default delimiter (space)
tokenizer = StrTokenizer("This is a test string")

# Example with custom delimiter
tokenizer = StrTokenizer("This,is,a,test,string", ",")

# Example with custom delimiter and returning the delimiter as tokens
tokenizer = StrTokenizer("This,is,a,test,string", ",", return_delims=True)

Methods

countTokens() -> int

Returns the total number of tokens in the string.

token_count = tokenizer.countTokens()
print("Number of tokens:", token_count)

countTokensLeft() -> int

Returns the number of tokens left to be iterated.

tokens_left = tokenizer.countTokensLeft()
print("Tokens left:", tokens_left)

hasMoreTokens() -> bool

Checks if there are more tokens to iterate over.

if tokenizer.hasMoreTokens():
    print("There are more tokens available.")

nextToken() -> str

Returns the next token. Raises an IndexError if no more tokens are available.

while tokenizer.hasMoreTokens():
    print(tokenizer.nextToken())

rewind(steps: int = None) -> None

Resets the tokenizer's index either completely or by a specified number of steps:

  • Without arguments: Resets the tokenizer back to the first token.
  • With steps: Moves the tokenizer back by the given number of steps.
# Rewind completely
tokenizer.rewind()

# Rewind by 2 tokens
tokenizer.rewind(2)

Example

from tokenizer import StrTokenizer

# Create a tokenizer with a custom delimiter
tokenizer = StrTokenizer("apple,orange,banana,grape", ",")

# Get the number of tokens
print("Number of tokens:", tokenizer.countTokens())

# Iterate over the tokens
while tokenizer.hasMoreTokens():
    print("Token:", tokenizer.nextToken())

# Rewind the tokenizer and iterate again
tokenizer.rewind()
print("After rewinding:")
while tokenizer.hasMoreTokens():
    print("Token:", tokenizer.nextToken())

Output:

Number of tokens: 4
Token: apple
Token: orange
Token: banana
Token: grape
After rewinding:
Token: apple
Token: orange
Token: banana
Token: grape

Methods Overview

  • __init__(self, inputstring: str, delimiter: str = " ", return_delims: bool = False):

    • Initializes the StrTokenizer with the given string, delimiter, and whether to return delimiters as tokens.
  • __create_token(self) -> None:

    • Splits the input string into tokens based on the delimiter.
  • countTokens(self) -> int:

    • Returns the total number of tokens.
  • countTokensLeft(self) -> int:

    • Returns the number of tokens left for iteration.
  • hasMoreTokens(self) -> bool:

    • Checks if there are more tokens to be retrieved.
  • nextToken(self) -> str:

    • Returns the next available token or raises an IndexError if no tokens are left.
  • rewind(self, steps: int = None) -> None:

    • Resets the tokenizer's index either completely or by a given number of steps.

You can install the StrTokenizer package from PyPI:

Install StrTokenizer from PyPI

Source Code:

Github Link

License

This project is open-source and available for modification or distribution.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

strtokenizer-1.1.0.tar.gz (4.1 kB view hashes)

Uploaded Source

Built Distribution

StrTokenizer-1.1.0-py3-none-any.whl (4.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page