Skip to main content

A Python equivalent of Java's StringTokenizer with some added functionality

Project description

StrTokenizer

A Python module that mimics the functionality of the Java StringTokenizer class. This class splits a given string into tokens based on a specified delimiter and offers methods to iterate over the tokens, count them, and manipulate the tokenizer's state.

Installation

To install the StrTokenizer package globally, you can use pip. Here are the steps to install:

  1. Ensure you have pip installed on your system.
  2. Open your command line interface (CLI) and run:
pip install StrTokenizer

If you want to use it locally without installing, simply download or copy the tokenizer.py file and import it into your project.

Usage

Import the Module

If the module is installed via pip, import the class from your module:

from StrTokenizer import StrTokenizer

If the module (tokenizer.py) is downloaded from GitHub, import it like this:

from tokenizer import StrTokenizer

Creating a StrTokenizer Object

To create an instance of StrTokenizer, provide the input string, the delimiter (optional, defaults to a space " "), and whether to return the delimiters as tokens (optional, defaults to False).

# Example with default delimiter (space)
tokenizer = StrTokenizer("This is a test string")

# Example with custom delimiter
tokenizer = StrTokenizer("This,is,a,test,string", ",")

# Example with custom delimiter and returning the delimiter as tokens
tokenizer = StrTokenizer("This,is,a,test,string", ",", return_delims=True)

Methods

countTokens() -> int

Returns the total number of tokens in the string.

token_count = tokenizer.countTokens()
print("Number of tokens:", token_count)

countTokensLeft() -> int

Returns the number of tokens left to be iterated.

tokens_left = tokenizer.countTokensLeft()
print("Tokens left:", tokens_left)

hasMoreTokens() -> bool

Checks if there are more tokens to iterate over.

if tokenizer.hasMoreTokens():
    print("There are more tokens available.")

nextToken() -> str

Returns the next token. Raises an IndexError if no more tokens are available.

while tokenizer.hasMoreTokens():
    print(tokenizer.nextToken())

rewind(steps: int = None) -> None

Resets the tokenizer's index either completely or by a specified number of steps:

  • Without arguments: Resets the tokenizer back to the first token.
  • With steps: Moves the tokenizer back by the given number of steps.
# Rewind completely
tokenizer.rewind()

# Rewind by 2 tokens
tokenizer.rewind(2)

Example

from tokenizer import StrTokenizer

# Create a tokenizer with a custom delimiter
tokenizer = StrTokenizer("apple,orange,banana,grape", ",")

# Get the number of tokens
print("Number of tokens:", tokenizer.countTokens())

# Iterate over the tokens
while tokenizer.hasMoreTokens():
    print("Token:", tokenizer.nextToken())

# Rewind the tokenizer and iterate again
tokenizer.rewind()
print("After rewinding:")
while tokenizer.hasMoreTokens():
    print("Token:", tokenizer.nextToken())

Output:

Number of tokens: 4
Token: apple
Token: orange
Token: banana
Token: grape
After rewinding:
Token: apple
Token: orange
Token: banana
Token: grape

Methods Overview

  • __init__(self, inputstring: str, delimiter: str = " ", return_delims: bool = False):

    • Initializes the StrTokenizer with the given string, delimiter, and whether to return delimiters as tokens.
  • __create_token(self) -> None:

    • Splits the input string into tokens based on the delimiter.
  • countTokens(self) -> int:

    • Returns the total number of tokens.
  • countTokensLeft(self) -> int:

    • Returns the number of tokens left for iteration.
  • hasMoreTokens(self) -> bool:

    • Checks if there are more tokens to be retrieved.
  • nextToken(self) -> str:

    • Returns the next available token or raises an IndexError if no tokens are left.
  • rewind(self, steps: int = None) -> None:

    • Resets the tokenizer's index either completely or by a given number of steps.

You can install the StrTokenizer package from PyPI:

Install StrTokenizer from PyPI

Source Code:

Github Link

License

This project is open-source and available for modification or distribution.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

strtokenizer-1.1.0.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

StrTokenizer-1.1.0-py3-none-any.whl (4.5 kB view details)

Uploaded Python 3

File details

Details for the file strtokenizer-1.1.0.tar.gz.

File metadata

  • Download URL: strtokenizer-1.1.0.tar.gz
  • Upload date:
  • Size: 4.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for strtokenizer-1.1.0.tar.gz
Algorithm Hash digest
SHA256 5bdccbf31c4f7956850344772d0d6949f148faa56ffe3b03517e623c4e24eb6b
MD5 7ad642ee0e6acaa6849b59c5d95427c3
BLAKE2b-256 bc8733987fde57d456599f18b245a24017f9d120e5dc4119673099411ec2a5d5

See more details on using hashes here.

File details

Details for the file StrTokenizer-1.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for StrTokenizer-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2971e2572b9d92a455e83eeba501e4ccb03dc9cde7ecad2a8cdcf74bd6706879
MD5 d9d9e3c5215a262940c242615d047cea
BLAKE2b-256 446cd11d591189820fbd04df66e961130185bb3ad9448decd94d61d2698c2a71

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page