Project description

LexiPy

LexiPy is a Python package that provides a simple lexer implementation. It allows you to tokenize a string by breaking it down into individual tokens. LexiPy also supports the usage of special terms, which can be treated as one token during tokenization, this can be usefull in domain specific cases like programming languages.

Installation

You can install LexiPy using pip:

pip install lexipy

Usage

Lexer Class (iterator)

The Lexer class returns an iterator over the tokens

from lexipy import Lexer

content = "Hello, World!"
lexer = Lexer(content, special_terms=None) # Iterator
tokens = list(lexer) # list of tokens list[str]

tokens
>>> ['hello', ',', 'world', '!']

content (str): The string to be tokenized.
special_terms (set[str] | None, optional): A set of special terms to be treated as one token. Default is None.

Returns an iterator over the tokens

lexify Function

The lexify function provides a more convenient way to tokenize a string.

from lexipy import lexify

content = "Hello, World!"
tokens = lexify(content, special_terms=None)

tokens
>>> ['hello', ',', 'world', '!']

content (str): The string to be tokenized.
special_terms (set[str] | None, optional): A set of special terms to be treated as one token. Default is None.

Returns a list of tokens.

Special terms

You can specify a set of special terms that should be treated as one token during the tokenization process. This can be useful when tokenizing programming languages or other cases where certain terms should not be split.

from lexipy import Lexer

content = "Hello, World!"
special_terms = {"world!"}
lexer = Lexer(content, special_terms=special_terms)
tokens = list(lexer)

tokens
>>> ['hello', ',', 'world!']

Note: The special terms are case insensitive

Planed features

Adding an optional word stemmer (default: nltk), also support for custom stemmers through an interface

Contributing

Contributions to LexiPy are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request on the GitHub repository.

License

LexiPy is licensed under the MIT License.

References

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

May 12, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lexipy-0.1.0.tar.gz (2.8 kB view details)

Uploaded May 12, 2023 Source

Built Distribution

lexipy-0.1.0-py3-none-any.whl (3.3 kB view details)

Uploaded May 12, 2023 Python 3

File details

Details for the file lexipy-0.1.0.tar.gz.

File metadata

Download URL: lexipy-0.1.0.tar.gz
Upload date: May 12, 2023
Size: 2.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.4.2 CPython/3.10.6 Windows/10

File hashes

Hashes for lexipy-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`353f729e1f300a60d4924b1a1395531244d17ece2eb303259943f741a426c828`
MD5	`a6aef81ed04c2dd136c2bbef523b6826`
BLAKE2b-256	`a0416f205af7c38c99a6682e9769b8ee0301a8f7e234b667c1ea79587fcd5356`

See more details on using hashes here.

File details

Details for the file lexipy-0.1.0-py3-none-any.whl.

File metadata

Download URL: lexipy-0.1.0-py3-none-any.whl
Upload date: May 12, 2023
Size: 3.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.4.2 CPython/3.10.6 Windows/10

File hashes

Hashes for lexipy-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8cb33dc3348d165159cd18f5481b27bbce35dc4ae800424c11d2ad6c43eb0e5a`
MD5	`247a2218d5e0bb5e499c6a28c2f93e2c`
BLAKE2b-256	`195a701b567c025d8fcd99120bd93dac52035958b3c6e0230f404120e7e7ce3f`