A basic lexer for tokenizing a string of text
Project description
LexiPy
LexiPy is a Python package that provides a simple lexer implementation. It allows you to tokenize a string by breaking it down into individual tokens. LexiPy also supports the usage of special terms, which can be treated as one token during tokenization, this can be usefull in domain specific cases like programming languages.
Installation
You can install LexiPy using pip:
pip install lexipy
Usage
Lexer Class (iterator)
The Lexer
class returns an iterator over the tokens
from lexipy import Lexer
content = "Hello, World!"
lexer = Lexer(content, special_terms=None) # Iterator
tokens = list(lexer) # list of tokens list[str]
tokens
>>> ['hello', ',', 'world', '!']
content
(str): The string to be tokenized.special_terms
(set[str] | None, optional): A set of special terms to be treated as one token. Default isNone
.
Returns an iterator over the tokens
lexify Function
The lexify
function provides a more convenient way to tokenize a string.
from lexipy import lexify
content = "Hello, World!"
tokens = lexify(content, special_terms=None)
tokens
>>> ['hello', ',', 'world', '!']
content
(str): The string to be tokenized.special_terms
(set[str] | None, optional): A set of special terms to be treated as one token. Default isNone
.
Returns a list of tokens.
Special terms
You can specify a set of special terms that should be treated as one token during the tokenization process. This can be useful when tokenizing programming languages or other cases where certain terms should not be split.
from lexipy import Lexer
content = "Hello, World!"
special_terms = {"world!"}
lexer = Lexer(content, special_terms=special_terms)
tokens = list(lexer)
tokens
>>> ['hello', ',', 'world!']
Note: The special terms are case insensitive
Planed features
- Adding an optional word stemmer (default: nltk), also support for custom stemmers through an interface
Contributing
Contributions to LexiPy are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request on the GitHub repository.
License
LexiPy is licensed under the MIT License.
References
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file lexipy-0.1.0.tar.gz
.
File metadata
- Download URL: lexipy-0.1.0.tar.gz
- Upload date:
- Size: 2.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.2 CPython/3.10.6 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 353f729e1f300a60d4924b1a1395531244d17ece2eb303259943f741a426c828 |
|
MD5 | a6aef81ed04c2dd136c2bbef523b6826 |
|
BLAKE2b-256 | a0416f205af7c38c99a6682e9769b8ee0301a8f7e234b667c1ea79587fcd5356 |
File details
Details for the file lexipy-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: lexipy-0.1.0-py3-none-any.whl
- Upload date:
- Size: 3.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.2 CPython/3.10.6 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8cb33dc3348d165159cd18f5481b27bbce35dc4ae800424c11d2ad6c43eb0e5a |
|
MD5 | 247a2218d5e0bb5e499c6a28c2f93e2c |
|
BLAKE2b-256 | 195a701b567c025d8fcd99120bd93dac52035958b3c6e0230f404120e7e7ce3f |