Python wrapper for libparsing, a PEG-based parsing library written in C
Project description
`libparsing` is a parsing element grammar (PEG) library written in C with
Python bindings. It offers decent performance while allowing for a
lot of flexibility. It is mainly intended to be used to create programming
languages and software engineering tools.
As opposed to more traditional parsing techniques, the grammar is not compiled
but constructed using an API that allows dynamic update of the grammar.
The parser does not do any tokeninzation, the instead input stream is
consumed and parsing elements are dynamically asked to match the next
element of it. Once parsing elements match, the resulting matched input is
processed and an action is triggered.
`libparsing` supports the following features:
- _backtracking_, ie. going back in the input stream if a match is not found
- _cherry-picking_, ie. skipping unrecognized input
- _contextual rules_, ie. a rule that will match or not depending on external
variables
Parsing elements are usually slower than compiled or FSM-based parsers as
they trade performance for flexibility. It's probably not a great idea to
use `libparsing` if the parsing has to happen as fast as possible (ie. a protocol
implementation), but it is a great use for programming languages, as it
opens up the door to dynamic syntax plug-ins and multiple language
embedding.
If you're interested in PEG, you can start reading Brian Ford's original
article. Projects such as PEG/LEG by Ian Piumarta <http://piumarta.com/software/peg/>
,OMeta by Alessandro Warth <http://www.tinlizzie.org/ometa/>
or Haskell's Parsec library <https://www.haskell.org/haskellwiki/Parsec>
are of particular interest in the field.
Here is a short example of what creating a simple grammar looks like
in Python:
```
g = Grammar()
s = g.symbols
g.token("WS", "\s+")
g.token("NUMBER", "\d+(\.\d+)?")
g.token("VARIABLE", "\w+")
g.token("OPERATOR", "[\/\+\-\*]")
g.group("Value", s.NUMBER, s.VARIABLE)
g.rule("Suffix", s.OPERATOR._as("operator"), s.Value._as("value"))
g.rule("Expression", s.Value, s.Suffix.zeroOrMore())
g.axiom(s.Expression)
g.skip(s.WS)
match = g.parseString("10 + 20 / 5")
```
and the equivalent code in C
```
Grammar* g = Grammar_new()
SYMBOL(WS, TOKEN("\\s+"))
SYMBOL(NUMBER, TOKEN("\\d+(\\.\\d+)?"))
SYMBOL(VARIABLE, TOKEN("\\w+"))
SYMBOL(OPERATOR, GROUP("[\\/\\+\\-\\*]"))
SYMBOL(Value, GOUP(_S(NUMBER), _S(VARIABLE)))
SYMBOL(Suffix, RULE(_AS(_S(OPERATOR), "operator"), _AS(_S(Value), "value")))
SYMBOL(Expression, RULE(_S(Value), _MO(Suffix))
g->axiom = s_Expression;
g->skip(s_WS);
Grammar_prepare(g);
Match* match = Grammar_parseString(g, "10 + 20 / 5")
```
Python bindings. It offers decent performance while allowing for a
lot of flexibility. It is mainly intended to be used to create programming
languages and software engineering tools.
As opposed to more traditional parsing techniques, the grammar is not compiled
but constructed using an API that allows dynamic update of the grammar.
The parser does not do any tokeninzation, the instead input stream is
consumed and parsing elements are dynamically asked to match the next
element of it. Once parsing elements match, the resulting matched input is
processed and an action is triggered.
`libparsing` supports the following features:
- _backtracking_, ie. going back in the input stream if a match is not found
- _cherry-picking_, ie. skipping unrecognized input
- _contextual rules_, ie. a rule that will match or not depending on external
variables
Parsing elements are usually slower than compiled or FSM-based parsers as
they trade performance for flexibility. It's probably not a great idea to
use `libparsing` if the parsing has to happen as fast as possible (ie. a protocol
implementation), but it is a great use for programming languages, as it
opens up the door to dynamic syntax plug-ins and multiple language
embedding.
If you're interested in PEG, you can start reading Brian Ford's original
article. Projects such as PEG/LEG by Ian Piumarta <http://piumarta.com/software/peg/>
,OMeta by Alessandro Warth <http://www.tinlizzie.org/ometa/>
or Haskell's Parsec library <https://www.haskell.org/haskellwiki/Parsec>
are of particular interest in the field.
Here is a short example of what creating a simple grammar looks like
in Python:
```
g = Grammar()
s = g.symbols
g.token("WS", "\s+")
g.token("NUMBER", "\d+(\.\d+)?")
g.token("VARIABLE", "\w+")
g.token("OPERATOR", "[\/\+\-\*]")
g.group("Value", s.NUMBER, s.VARIABLE)
g.rule("Suffix", s.OPERATOR._as("operator"), s.Value._as("value"))
g.rule("Expression", s.Value, s.Suffix.zeroOrMore())
g.axiom(s.Expression)
g.skip(s.WS)
match = g.parseString("10 + 20 / 5")
```
and the equivalent code in C
```
Grammar* g = Grammar_new()
SYMBOL(WS, TOKEN("\\s+"))
SYMBOL(NUMBER, TOKEN("\\d+(\\.\\d+)?"))
SYMBOL(VARIABLE, TOKEN("\\w+"))
SYMBOL(OPERATOR, GROUP("[\\/\\+\\-\\*]"))
SYMBOL(Value, GOUP(_S(NUMBER), _S(VARIABLE)))
SYMBOL(Suffix, RULE(_AS(_S(OPERATOR), "operator"), _AS(_S(Value), "value")))
SYMBOL(Expression, RULE(_S(Value), _MO(Suffix))
g->axiom = s_Expression;
g->skip(s_WS);
Grammar_prepare(g);
Match* match = Grammar_parseString(g, "10 + 20 / 5")
```
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
libparsing-0.3.6.tar.gz
(26.3 kB
view hashes)
Built Distribution
Close
Hashes for libparsing-0.3.6.linux-x86_64.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | e4b9efd130b0f67b020e0a12bbe0f886cf5fefa672d8d7eb069e3c557897da0e |
|
MD5 | 7eba8fb5c3b9d0c9f4b6e6c7eb9b3591 |
|
BLAKE2b-256 | a16b11439f928004eaf987832760582f887691020e1f18858c7298c2ce2c9acf |