Skip to main content

A generic pattern-based Lexer/tokenizer tool.

Project description

Generic Lexer

image image image image image

A generic pattern-based Lexer/tokenizer tool.

The minimum python version is 3.6

Original Author
Eli Bendersky with this gist last modified on 2010/08
Maintainer
Leandro Benedet Garcia last modified on 2020/11
Version
1.1.0
License
The Unlicense
Documentation
The documentation can be found here

Example

If we try to execute the following code:

from generic_lexer import Lexer


rules = {
    "VARIABLE": r"(?P<var_name>[a-z_]+): (?P<var_type>[A-Z]\w+)",
    "EQUALS": r"=",
    "SPACE": r" ",
    "STRING": r"\".*\"",
}

data = "first_word: String = \"Hello\""
data = data.strip()

for curr_token in Lexer(rules, False, data):
    print(curr_token)

Will give us the following output:

VARIABLE({'var_name': 'first_word', 'var_type': 'String'}) at 0
SPACE( ) at 18
EQUALS(=) at 19
SPACE( ) at 20
STRING("Hello") at 21

As you can see differently from the original gist, we are capable of specifying multiple groups per token. You cannot use the same group twice, either per token or not because all the regex patterns are merged together to generate the tokens later on.

You may get the values of the tokens this way:

>>> from generic_lexer import Lexer
>>> rules = {
...     "VARIABLE": r"(?P<var_name>[a-z_]+): (?P<var_type>[A-Z]\w+)",
...     "EQUALS": r"=",
...     "STRING": r"\".*\"",
... }
>>> data = "first_word: String = \"Hello\""
>>> variable, equals, string = tuple(Lexer(rules, True, data))

>>> variable
VARIABLE({'var_name': 'first_word', 'var_type': 'String'}) at 0

>>> variable.val
{'var_name': 'first_word', 'var_type': 'String'}
>>> variable["var_name"]
'first_word'
>>> variable["var_type"]
'String'

>>> equals
EQUALS(=) at 19

>>> equals.val
'='

>>> string
STRING("Hello") at 21

>>> string.val
'"Hello"'

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

generic_lexer-1.1.1.tar.gz (7.6 kB view hashes)

Uploaded Source

Built Distribution

generic_lexer-1.1.1-py2.py3-none-any.whl (7.5 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page