Parser combinator library with a BNF-esque interface
Project description
Comber is a parser combinator library for creating text parsers in plain Python with a BNF flavor.
For instance, we could define a simple grammar for calling functions on integer values like this:
from comber import C, rs, inf
keyword = rs(r'[_a-zA-Z][_a-zA-Z0-9]*')@('keyword')
number = rs(r'[0-9]+')@('number', int)
package = keyword[1, inf, ',']
import_statement = C+ 'import' + package
value = keyword | number
function_call = keyword + '(' + value**',' + ')'
assignment = C+ 'let' + keyword + '=' + (function_call | value)
grammar = (import_statement | assignment | function_call)[0, inf]
Our toy grammar can handle simple import statements, variable assignment, and function calling. For example:
import math import sys.io let user_number = read() let double = multiple(user_number, 2) print(double)
To use our parser, we simply pass a string to it:
from comber import ParseError
code = "add(17, 3)"
try:
parseState = grammar(code)
print('Parsed tokens: %s' % parseState.tree)
except ParseError as ex:
print(ex)
The grammar call returns the final parser state on a successful parse, from which you can retrieve the final parse tree. In our case, it is effectively a list:
['add', '(', 17, 3, ')']
Our grammar only supports variables and integers. Suppose we tried to assign pythong-style string to a variable:
let text = "Here's some text"
If we tried to parse this, we’d get an exception like:
1:12: Unexpected Text: "Here's som. Expected one of: @keyword, @number
Installing
You can get the latest release from pip:
$ pip install comber
or from the source:
$ pip install .
Documentation
Building Parsers
The most basic parsers are string literals (which match exactly), rs for matching a regular expression, and cs for matching any of the strings of an iterable.
The rs parser can be seen in our toy grammar above. Standard Python regular expressions are supported. The entire match is used as the result of the parse, regardless of groupings etc.
The cs parse takes any iterable of strings. The first string that exactly matches (the start of) the input text. If none of the strings match, the parse fails. Since strings are themselves iterables, we could add addition and subtraction to our toy grammar like this:
from comber import C, cs
addition = value + cs('+-') + value
assignment = C+ 'let' + keyword + '=' + (addition | function_call | value)
String literals can only be used in combination with other parsers. If a string literal would start a sequence, or otherwise appear alone, make use of the C parser.
The C Parser
The C parser, on its own, consumes no text, produces no tokens, and always succeeds. It’s most useful for starting a parser that would otherwise begin with a string literal. E.g. this:
'let' + keyword + '=' + (function_call | value)
would actually throw a Python error because ‘let’ isn’t really a parser - yet! That’s where C comes in:
C+ 'let' + keyword + '=' + (function_call | value)
C starts off the sequence, so we can use any combination of parsers and string literals we like from there. It works similarly with alternatives, so if we wanted to allow set to be used as a synonym for let, we might do:
(C| 'let'|'set') + keyword + '=' + (function_call | value)
C can also be used to wrap a parser to protect it from optimization; for instance, embedding one sequence or alternative set inside another. If, for instance, we extended our grammar to allow a bare value to be a whole statement:
value = (keyword | number)@'value'
grammar = (import_statement | assignment | function_call | value)[0, inf]
Basic Combinators
Parsers can be combined in series with +:
name + address + pet
A sequence of parsers is evaluated left to right, each consuming text before the next is evaluated. If at any point in the sequence a parser fails, the entire sequence fails.
A set of alternatives is built with |:
name | idnumber | location
Alternatives are considered left to right, with the first successful match being the match for the entire set. Be careful! This means that for some sets of alternatives, the “obvious” parser may not be the one used, simply because it came after another match.
Both sequences and alternatives will flatten like combinators, such that:
name = firstname + lastname
salutation = C+ 'Hello' + name + '!'
is equivalent to:
salutation = C+ 'Hello' + firstname + lastname + '!'
If you need to mantain the logical separation (to parse correctly, or maintain the name of a subparser), wrap the subparser with C:
name = C(firstname + lastname)
salutation = C+ 'Hello' + name + '!'
Repetition
The most flexible option for specifying repetition is brackets:
keyword[0, 10, ',']
The above would parse keyword zero to ten times, separated by a comma. The separator is optional - without it, the result would simply parse keyword zero to ten times.
We could also specify parsing an exact number of times:
keyword[10]
Or, with a separator:
keyword[10, None, ',']
Infinity - math.inf - is a valid maximum value. For convenience, it can be imported directly from Comber:
from comber import inf
param_list = keyword[0, inf, ',']
There are several convenience combinators for common types of repetition.
For zero or more with a separator, using *:
parser*','
Or zero or more without a separator, using the unary +:
+parser
You can declare a parser as optional with ~:
~parser
Recursive Grammars
Consider a program in our toy grammar:
let word = exp(2, 16) let maxint = minus(word, 1)
It’d be nice to simplify this by eliminating the variable “word” and pass the exp() call directly to minus(), but to allow that, we need to extend our grammar to consider a function call to be a value so we can use one as a function argument. But to do that, we’d need to use function_call in our definition of value - but function_call likewise needs to reference value.
To solve this issue, we can define one of them as defer.
from comber import C, rs, inf, defer
...
value = defer()
function_call = keyword + '(' + value**',' + ')'
value.fill(keyword | number | function_call)
This way, we can refer to value wherever we want, and only define its meaning when we’re ready. We can safely build fairly complex grammars this way, but be wary of performance.
Controlling Space
Before each subparser is run, Comber consumes leading whitespace. By default, this is spaces, tabs, and newlines. You can change this in two ways: set the whitespace property on the outermost parser to a string containing the new whitespace characters, or likewise passing a string as the second argument to the parser. In either case, the value None will disable the feature altogether.
Building Parse Trees
When you call a parser on a some text, they return a State object containing the resultant parse tree (State.tree).
By default, parsers output a “flat” tree - a list of strings parsed by the “leaf” parsers (i.e. string literals, rs, and cs).
To build a more useful parse tree, you have to provide emitters. Our toy grammar contains a simple one that converts integers in the input to Python int values:
number = rs(r'[0-9]+')@('number', int)
So if we ran our parser on the input “let foo = 5” the resulting state’s tree property would be ["let", "foo", "=", 5]. But it’d be more useful if it resulted in some kind of “let” object (that could execute the assignment, or be fed to a VM, or whatever else). We could define one we can use as a emitter for our let statement like this:
class Let:
def __init__(self, let, variable, eq, value):
self.variable = variable
self.value = value
# We can ignore `let` and `eq` (which will always contain "let" and "=", respectively).
def __repr__(self):
return f'Let({self.variable}, {self.value})'
and redefine the assignment rule like:
assignment = (C+ 'let' + keyword + '=' + (function_call | value))@Let
Now if rerun the parser on “let foo = 5”, we get [Let(foo, 5)].
As we did with number, you can also combine an emitter with a name, to improve error messages:
assignment = (C+ 'let' + keyword + '=' + (function_call | value))@("assignment", Let)
Performance
Under the covers, Comber is essentially a recursive descent parser. It’s best suited for relatively shallow grammars parsing small amounts of text.
You can gain a bit more speed by calling analyze() on the top-level/outermost parser. This modifies the parser (and subparsers) in place:
grammar.analyze()
grammar('max(1, 4)')
TODO
Available Operators
Operators that Python allows to overridden
Operator |
Method |
Current use |
|---|---|---|
+ |
__add__ |
sequences |
| |
__or__ |
selection |
[ ] |
__getitem__ |
repeat |
@ |
__matmul__ |
names and internalization |
< |
__lt__ |
|
> |
__gt__ |
|
<= |
__le__ |
|
>= |
__ge__ |
|
== |
__eq__ |
|
!= |
__ne__ |
|
is |
_is |
|
is not |
is_not |
|
- |
__sub__ |
|
% |
__mod__ |
|
* |
__mul__ |
zero or more, with provided separator |
** |
__pow__ |
|
/ |
__truediv__ |
|
// |
__floordiv__ |
|
& |
__and__ |
|
^ |
__xor__ |
|
<< |
__lshift__ |
|
>> |
__rshift__ |
|
in |
__contains__ |
Unary operators:
Operator |
Method |
Current use |
|---|---|---|
~ |
__invert__ |
optional |
not |
__not__ |
|
- |
__neg__ |
|
+ |
__pos__ |
zero or more |
And:
() __call__ parse a string
Copyright
Comber copyright 2025 Ray Wallace III
This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, version 2.1.
This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file comber-1.1.0.tar.gz.
File metadata
- Download URL: comber-1.1.0.tar.gz
- Upload date:
- Size: 28.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
258edcb64384a794c07267d884d9e1742e594d3b6a7806b94b03428026541218
|
|
| MD5 |
4d2fab8aa25437b11b5749be6a9c8dad
|
|
| BLAKE2b-256 |
079795be5625b972baceaef2cb635f1144c67e2d7a93cedf2c7b7ceb1e5842cc
|
Provenance
The following attestation bundles were made for comber-1.1.0.tar.gz:
Publisher:
python-publish.yml on faboo/comber
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
comber-1.1.0.tar.gz -
Subject digest:
258edcb64384a794c07267d884d9e1742e594d3b6a7806b94b03428026541218 - Sigstore transparency entry: 927056354
- Sigstore integration time:
-
Permalink:
faboo/comber@a49635e932193071d3abd205d9c49411338527cb -
Branch / Tag:
refs/tags/v1.1.0 - Owner: https://github.com/faboo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@a49635e932193071d3abd205d9c49411338527cb -
Trigger Event:
release
-
Statement type:
File details
Details for the file comber-1.1.0-py3-none-any.whl.
File metadata
- Download URL: comber-1.1.0-py3-none-any.whl
- Upload date:
- Size: 22.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7e07891eed2bddaa352ee44539505b6dfd6561951f6532ad0c1c28e6c5ad0855
|
|
| MD5 |
c24582525aa56d1123d3a9f8aff227fb
|
|
| BLAKE2b-256 |
58aa64ad53f76688880ab99beb28008e8b6664d6b3bfb9d84ed08af010b79707
|
Provenance
The following attestation bundles were made for comber-1.1.0-py3-none-any.whl:
Publisher:
python-publish.yml on faboo/comber
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
comber-1.1.0-py3-none-any.whl -
Subject digest:
7e07891eed2bddaa352ee44539505b6dfd6561951f6532ad0c1c28e6c5ad0855 - Sigstore transparency entry: 927056356
- Sigstore integration time:
-
Permalink:
faboo/comber@a49635e932193071d3abd205d9c49411338527cb -
Branch / Tag:
refs/tags/v1.1.0 - Owner: https://github.com/faboo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@a49635e932193071d3abd205d9c49411338527cb -
Trigger Event:
release
-
Statement type: