Project description

liblex2-py3

Simple tokenizer using regex.

NOTE: Documentation coming soon.

About

"lex2" is a library to perform lexical analysis (often called tokenization). Rulesets are defined and scanned using regular expressions (regex). Mechanisms such as the ruleset-stack and setting processing options provide flexibility to some degree at runtime.

The library is written as platform independent, pure Python3. Customization for adding a different regex engine implementation is very effortless whilst remaining to have a simple to use, unified interface for implementation-independent usage.

Quickstart

Recommended is to install the library from the Python Package Index (PyPI) through Python's package manager pip:

pip install lex2

However, you can also choose to manually install the library by download a release on GitHub and copying the lex2 folder to your project's includes/libraries directory.

Usage of lex2 is relatively simple, as demonstrated by the short example below. Nonetheless, it it still encouraged to read the documentation for more in-depth examples.

import lex2

# Define ruleset and prepare the lexer object instance
ruleset: lex2.ruleset_t = [
    #        Identifier     Regex pattern
    lex2.Rule("WORD",        r"[a-zA-Z]+"),
    lex2.Rule("NUMBER",      r"[0-9]+"),
    lex2.Rule("PUNCTUATION", r"[.,:;!?\\-]")
]
lexer: lex2.ILexer = lex2.MakeLexer(ruleset=ruleset)

# Load input data by opening a file
lexer.Open(r"C:/path/to/file.txt")
# Or by directly passing a string
lexer.Load("The quick, brown fox jumps over 2 lazy dogs. \nMr. Jock, TV quiz PhD, bags few lynx.")

# Main lexing loop
token: lex2.Token
while(1):

    # Find the next token in the textstream
    try: token = lexer.GetNextToken()
    except lex2.excs.EndOfData:
        break

    info = [
         "ln: {}".format(token.position.ln +1),
        "col: {}".format(token.position.col+1),
        token.id,
        token.data,
    ]
    print("{: <12} {: <15} {: <20} {: <20}".format(*info))

lexer.Close()

>>> ln: 1        col: 1          WORD                 The
>>> ln: 1        col: 5          WORD                 quick
>>> ln: 1        col: 10         PUNCTUATION          ,
>>> ln: 1        col: 12         WORD                 brown
>>> ln: 1        col: 18         WORD                 fox
>>> ln: 1        col: 22         WORD                 jumps
>>> ln: 1        col: 28         WORD                 over
>>> ln: 1        col: 33         NUMBER               2
>>> ln: 1        col: 35         WORD                 lazy
>>> ln: 1        col: 40         WORD                 dogs
>>> ln: 1        col: 44         PUNCTUATION          .
>>> ln: 2        col: 1          WORD                 Mr
>>> ln: 2        col: 3          PUNCTUATION          .
>>> ln: 2        col: 5          WORD                 Jock
>>> ln: 2        col: 9          PUNCTUATION          ,
>>> ln: 2        col: 11         WORD                 TV
>>> ln: 2        col: 14         WORD                 quiz
>>> ln: 2        col: 19         WORD                 PhD
>>> ln: 2        col: 22         PUNCTUATION          ,
>>> ln: 2        col: 24         WORD                 bags
>>> ln: 2        col: 29         WORD                 few
>>> ln: 2        col: 33         WORD                 lynx
>>> ln: 2        col: 37         PUNCTUATION          .

Contributing

The repository is hosted at deltarazero/liblex2-py3 on GitHub. Contribution is always welcome; you can contribute by doing one of the following:

Submitting a pull request: to contribute your own changes to the repository. See "About pull requests" for more information on pull requests on GitHub. Please follow the guidelines below:
1. File an issue to notify the maintainers about what you're working on.
2. Fork the repo, develop and test your code changes, add docs (if applicable).
3. Make sure that your commit messages clearly describe the changes.
4. Send a pull request.
For changes that address core functionality or would require breaking changes (e.g. a major release), it's best to open an issue to discuss your proposal first.

Furthermore, maintaining your own fork of the repository is discouraged. Please submit pull requests instead, as this will make it less confusing for users to know which repository is the most up-to-date.
Submitting an issue: to report problems with the library, request a new feature, or to discuss potential changes before a pull request is created.

License

All included scripts, modules, etc. are licensed under the terms of the zlib license, unless stated otherwise in the respective files.

Project details

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.1.1

Nov 26, 2022

1.1.0

Nov 26, 2022

1.0.0

Jul 1, 2022

0.9.4

Aug 23, 2021

This version

0.9.3

Apr 19, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lex2-0.9.3.tar.gz (21.2 kB view hashes)

Uploaded Apr 19, 2021 Source

Built Distribution

lex2-0.9.3-py3-none-any.whl (33.4 kB view hashes)

Uploaded Apr 19, 2021 Python 3

Hashes for lex2-0.9.3.tar.gz

Hashes for lex2-0.9.3.tar.gz
Algorithm	Hash digest
SHA256	`8c00d99f204a17b004daf1a524a83dae218a9774f88bb481005ac1d71a1d4799`
MD5	`5c98674aa0a217796df02660131d8dcc`
BLAKE2b-256	`efd17f8cc287ed08a26682261bbe1de13765f0650e4f5d82ab2ad962b632688e`

Hashes for lex2-0.9.3-py3-none-any.whl

Hashes for lex2-0.9.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a76f9f621bb754e6d38f3f7f9357337e8b94766cf4c20468c59a595087501d98`
MD5	`3944b668ac084b1df55a81ea7782e4e0`
BLAKE2b-256	`681f0378dfd136ae6810c81c4a2e9e134bc13135e49b572be37757e6d71d1113`