The livelex lexer
This module is designed to parse text using rules, which are regular-expression based. Rules are grouped into lexicons, and lexicons are grouped into a Language object. Every lexicon has its own set of rules that describe the text that is expected in that context.
A rule consists of three parts: a pattern, an action and a target.
- The pattern is a either a regular expression string, or an object that inherits Pattern. In that case its build() method is called to get the pattern.
- The action can be any object, and is streamed together with the matched part of the text. It can be seen as a token. If the action is an instance of Action, its filter_actions() method is called, which can yield zero or more tokens. The special skip action skips the matching text.
- The target is a list of objects, which can be integer numbers or references to a different lexicon. A positive number pushes the same lexicon on the stack, while a negative number pops the current lexicon(s) off the stack, so that lexing the text continues with a previous lexicon. It is also possible to pop a lexicon and push a different one.
Using a special rule, a lexicon may specify a default action, which is streamed with text that is not recognized by any other rule in the lexicon. A lexicon may also specify a default target, which is chosen when no rule matches the current text.
Here is a crude example of how to create a Language class and then use it:
from livelex import ( Language, lexicon, Words, Subgroup, Text, default_action, default_target, skip, MatchTarget, TextTarget, ) class MyLang(Language): """A Language represents a set of Lexicons comprising a specific language. A Language is never instantiated. The class itself serves as a namespace and can be inherited from. """ @lexicon(re_flags=0) def root(cls): yield r'"', "string", cls.string yield r'\(', "paren", cls.parenthesized yield r'\d+', "number" yield r'%', "comment", cls.comment yield r'[,.!?]', "punctuation" yield r'\w+', "word" @lexicon def string(cls): yield r'\\[\\"]', 'string escape' yield r'"', "string", -1 yield default_action, "string" @lexicon(re_flags=re.MULTILINE) def comment(cls): yield r'$', "comment", -1 yield r'XXX|TODO', "todo" yield default_action, "comment" @lexicon def parenthesized(cls): yield r'\)', "paren", -1 yield from cls.root() s = r""" This is (an example) text with 12 numbers and "a string with \" escaped characters", and a % comment that TODO lasts until the end of the line. """ >>> from livelex import Document >>> Document(MyLang.root, s).root().dump() <Context MyLang.root at 1-144 (20 children)> ├╴<Token 'This' at 1 (word)> ├╴<Token 'is' at 6 (word)> ├╴<Token '(' at 9 (paren)> ├╴<Context MyLang.parenthesized at 10-21 (3 children)> │ ├╴<Token 'an' at 10 (word)> │ ├╴<Token 'example' at 13 (word)> │ ╰╴<Token ')' at 20 (paren)> ├╴<Token 'text' at 22 (word)> ├╴<Token 'with' at 27 (word)> ├╴<Token '12' at 32 (number)> ├╴<Token 'numbers' at 35 (word)> ├╴<Token 'and' at 43 (word)> ├╴<Token '"' at 47 (string)> ├╴<Context MyLang.string at 48-84 (4 children)> │ ├╴<Token 'a string with ' at 48 (string)> │ ├╴<Token '\\"' at 62 (string escape)> │ ├╴<Token ' escaped characters' at 64 (string)> │ ╰╴<Token '"' at 83 (string)> ├╴<Token ',' at 84 (punctuation)> ├╴<Token 'and' at 86 (word)> ├╴<Token 'a' at 90 (word)> ├╴<Token '%' at 92 (comment)> ├╴<Context MyLang.comment at 93-131 (3 children)> │ ├╴<Token ' comment that ' at 93 (comment)> │ ├╴<Token 'TODO' at 107 (todo)> │ ╰╴<Token ' lasts until the end' at 111 (comment)> ├╴<Token 'of' at 132 (word)> ├╴<Token 'the' at 135 (word)> ├╴<Token 'line' at 139 (word)> ╰╴<Token '.' at 143 (punctuation)>
The livelex module is written and maintained by Wilbert Berendsen.
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size livelex-0.0.7.tar.gz (21.6 kB)||File type Source||Python version None||Upload date||Hashes View hashes|