Plex is a library building lexical analysers
Plex is a library building lexical analysers.
Plex is a Python module for constructing lexical analysers, or scanners. Plex scanners have almost all the capabilities of the scanners generated by GNU Flex, and are specified in a very similar way. Tokens are defined by regular expressions, and each token has an associated action, which may be to return a literal value, or to call an arbitrary function.
Plex is designed to fill a need that is left wanting by the existing Python regular expression modules. If you’ve ever tried to use one of them for implementing a scanner, you will have found that they’re not really suited to the task. You can define a bunch of regular expressions which match your tokens all right, but you can only match one of them at a time against your input. To match all of them at once, you have to join them all together into one big r.e., but then you’ve got no easy way to tell which one matched. This is the problem that Plex is designed to solve.
Another advantage of Plex is that it compiles all of the regular expressions into a single DFA. Once that’s done, the input can be processed in a time proportional to the number of characters to be scanned, and independent of the number or complexity of the regular expressions. Python’s existing regular expression matchers do not have this property.
You can get more information in the Sphinx-based documentation, located at http://packages.python.org/plex/.
Feedback and getting involved
Original author :
Maintainer : Stephane Klein <firstname.lastname@example.org>
- Create a Plex python package
- Convert documentation to Sphinx
- Convert tests to nose
- Format the source code with PEP8 recommendations
- 4 spaces indentation
- convert package and modules names to lowers cases
- Eliminated a syntax warning about assigning to None when using with Python 2.3.
- Fixed bug causing argument of Rep or Rep1 to fail to match following a newline.
- Fixed bug causing Eol to fail to match at the beginning of a line in some circumstances.
- Changed Scanner.yield() to Scanner.produce() to accommodate Python 2.3, where yield is a keyword.
- Changed test10 to not rely so much on details of string repr.
- Fixed two minor bugs: uncommented Scanner.next_char() and added import of types to Regexps.py.
- Added support for case-insensitive matches.
- First official release.