Skip to main content

A basic lexer for tokenizing a string of text

Project description

LexiPy

LexiPy is a Python package that provides a simple lexer implementation. It allows you to tokenize a string by breaking it down into individual tokens. LexiPy also supports the usage of special terms, which can be treated as one token during tokenization, this can be usefull in domain specific cases like programming languages.

Installation

You can install LexiPy using pip:

pip install lexipy

Usage

Lexer Class (iterator)

The Lexer class returns an iterator over the tokens

from lexipy import Lexer

content = "Hello, World!"
lexer = Lexer(content, special_terms=None) # Iterator
tokens = list(lexer) # list of tokens list[str]

tokens
>>> ['hello', ',', 'world', '!']
  • content (str): The string to be tokenized.
  • special_terms (set[str] | None, optional): A set of special terms to be treated as one token. Default is None.

Returns an iterator over the tokens

lexify Function

The lexify function provides a more convenient way to tokenize a string.

from lexipy import lexify

content = "Hello, World!"
tokens = lexify(content, special_terms=None)

tokens
>>> ['hello', ',', 'world', '!']
  • content (str): The string to be tokenized.
  • special_terms (set[str] | None, optional): A set of special terms to be treated as one token. Default is None.

Returns a list of tokens.

Special terms

You can specify a set of special terms that should be treated as one token during the tokenization process. This can be useful when tokenizing programming languages or other cases where certain terms should not be split.

from lexipy import Lexer

content = "Hello, World!"
special_terms = {"world!"}
lexer = Lexer(content, special_terms=special_terms)
tokens = list(lexer)

tokens
>>> ['hello', ',', 'world!']

Note: The special terms are case insensitive

Planed features

  • Adding an optional word stemmer (default: nltk), also support for custom stemmers through an interface

Contributing

Contributions to LexiPy are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request on the GitHub repository.

License

LexiPy is licensed under the MIT License.

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lexipy-0.1.0.tar.gz (2.8 kB view details)

Uploaded Source

Built Distribution

lexipy-0.1.0-py3-none-any.whl (3.3 kB view details)

Uploaded Python 3

File details

Details for the file lexipy-0.1.0.tar.gz.

File metadata

  • Download URL: lexipy-0.1.0.tar.gz
  • Upload date:
  • Size: 2.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.10.6 Windows/10

File hashes

Hashes for lexipy-0.1.0.tar.gz
Algorithm Hash digest
SHA256 353f729e1f300a60d4924b1a1395531244d17ece2eb303259943f741a426c828
MD5 a6aef81ed04c2dd136c2bbef523b6826
BLAKE2b-256 a0416f205af7c38c99a6682e9769b8ee0301a8f7e234b667c1ea79587fcd5356

See more details on using hashes here.

File details

Details for the file lexipy-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: lexipy-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 3.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.10.6 Windows/10

File hashes

Hashes for lexipy-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8cb33dc3348d165159cd18f5481b27bbce35dc4ae800424c11d2ad6c43eb0e5a
MD5 247a2218d5e0bb5e499c6a28c2f93e2c
BLAKE2b-256 195a701b567c025d8fcd99120bd93dac52035958b3c6e0230f404120e7e7ce3f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page