Skip to main content

Parser combinators library

Project description

Parcomb - Perser combinator library

Build Status PyPI version Coverage License: MIT Python Versions

Parcomb is a library for writing arbitrary text parsers and interpreters using regular python code. Technically, it's a top down back-tracing parser using parser combinators. It's heavily influences by the Parsec library

Installation

pip install parcomb

Usage

from typing import Tuple
from parcomb.char import char, trim
from parcomb.combinator import many, choice, between
from parcomb.number import integer
from parcomb.parsing import future

input1 = "(1 + 4 * 6) + 5 + (6 + (10 + 11)) + 5"

def eval(x: int, xs: list[Tuple[str, int]]) -> int:
    if not xs:
        return x

    current = xs[0]
    next = xs[1:]

    fdict = {
        "+": lambda a, b: eval(x + a, next),
        "-": lambda a, b: eval(x - a, next),
        "*": lambda a, b: eval(x * a, next),
        "/": lambda a, b: eval(x / a, next),
    }

    return fdict[current[0]](current[1], next)

op_prio1 = [trim(char(x)) for x in ["*", "/"]]
op_prio2 = [trim(char(x)) for x in ["+", "-"]]

expr = future()
factor = trim(integer()) | between(char("("), expr, char(")"))
term = (factor * many(choice(*op_prio1) * factor)).map_u(eval)
expr <<= (term * many(choice(*op_prio2) * term)).map_u(eval)

expr.run(input1)  # Success(value=62, next='')
from parcomb.number import integer
from parcomb.char import char, trim
from parcomb.string import literal
from parcomb.combinator import sep_by1

input1  = """\
498,4 -> 498,6 -> 496,6
503,4 -> 502,4 -> 502,9 -> 494,9
100,5"""

position = (integer() >> char(",")) * integer()
sep = trim(literal("->"))
line = sep_by1(position, sep)
lines = sep_by1(line, char("\n"))

lines.run(input1).get_or_raise()  # [[(498, 4), (498, 6), (496, 6)], [(503, 4), (502, 4), (502, 9), (494, 9)], [(100, 5)]]

More examples

Foundation

A parser is a function string -> (A, string) that reads zero or more characters from a string. It then optionally transforms what it read and return it as a tuple together with the part it didn't read.

  • Example 1: The string "abc" are applied to the any parser. It reads the "a" character and returns ("a", "bc")
  • Example 2: The string "12ab" are applied to the integer parser. It reads "12", transforms it to an integer, and returns (12, "ab")

The parser can also return a failure.

Multiple parsers can be combined to create new, more complex, parsers. Finally, a parser is evaluated by calling the run method on it

Value parsers

A value parser reads character(s) (input) and produces values (output). These parsers are the building blocks for more complex parsers. Parcomb contain many built in value parsers. They are located in submodules that corresponding to the type they produce. Character parsers are in parcomb.char, number parsers in parcomb.number, string parsers in parcomb.string, and so forth.

from parcomb.char import any, char, none_of

input = "test string"

# Reads (consumes) first character in the input text and sets it as output
any().run(input) # Success(value='t', next='est string')

# Attempts to read an "a" from input text but fails. Does not consume any characters
char("a").run(input)  # Failure(message='#char: Failed to find [a]. value: [t], ...', next='test string')

# Reads (consumes) any character as long as it is not a " " or a "a". 
none_of([" ", "a"]).run(input)  # Success(value='t', next='est string')

For more information, see implementation of the any and char parsers

Combinator parsers

Value parsers reads single values out of a text, but they are rarely useful by themselves. Instead, they serves as building blocks for combinator parsers. These parsers combine multiple parsers into more complex once.

from parcomb.number import integer
from parcomb.char import char, any, none_of
from parcomb.combinator import sep_by, combine, combine_f, many, product3

input1 = "2,3,5,7,11 Prime numbers"
input2 = "123,456"

# Parse zero or more integers, separated by ","
sep_by(integer(), char(",")).run(input1)  # Success(value=[2, 3, 5, 7, 11], next=' # Prime numbers')

# Combine any two characters using the build in "+" operator or custom function
combine(any(), any()).run(input1)  # Success(value='2,', next='3,5,7,11 # Prime numbers')
combine_f(any(), any(), lambda a, b: b + a).run(input1)  # Success(value=',2', next='3,5,7,11 # Prime numbers')

# Consume many non " " characters. The many parser continues to parse until its first failure
many(none_of([" "])).run(input1)  # Success(value=['2', ',', '3', ',', '5', .. ], next=' # Prime numbers')

# ProductN combines n parsers into a tuple
product3(integer(), char(","), integer()).run(input2)  # Success(value=(123, ',', 456), next='')

The library contains many useful parser combinators such as many, many1, choice, end_by, peek, and product

Ignoring data

Parsers often reads characters that should not be in the final output structure. Examples of this is:

  • Whitespace, such as new line characters or spaces
  • Characters that are used to define structure (such as "," in a csv document)
  • Comments to humans that have no impact on the data

The library provides two methods for ignoring data skip_left and skip_right. They are both parser combinators that takes two parsers as arguments and ignores one of them.

from parcomb.number import integer
from parcomb.char import char, spaces
from parcomb.combinator import sep_by, many, skip_left

input1 = "   2, 3,  5, 7,   11"

# Ignores 0 or more spaces in front of a number
nr = skip_left(spaces(), integer())
sep_by(nr, char(",")).run(input1)  # Success(value=[2, 3, 5, 7, 11], next='')

Transforming data

Every parser contains a transformation function called map and a sister function called map_u. The purpose of these functions are to convert a Parser[A] to a Parser[B] given a function A -> B. Very similar to how the map function converts a List[A] to a List[B]. The difference is that the map_u function first unpacks a tuple before applying it to the transformation function. This simplifies the usage with the product parser

from parcomb.number import integer
from parcomb.char import char, eof
from parcomb.combinator import product3, sep_by, end_by, choice

input1 = "2,3,5,7,11"
input2 = "This is a text; Comment"

# Create a tuple of "2", ",", 3 and then multiple the numbers
product3(integer(), char(","), integer()) \
    .map(lambda x: x[0] * x[2]).run(input1)  # Success(value=6, next=',5,7,11')
    # .map_u(lambda l, _, r: l * r).run(input)  # map_u unpacks a tuple to function parameters

# Create a list of the first 5 prime numbers and then sum them together
sep_by(integer(), char(",")).map(sum).run(input1)  # Success(value=28, next='')

# Read input, character by character, until we either get a ';' char or end of file. 
# transformation 1: Join the list of character into a string
# transformation 2: Convert all characters to upper case
end_by(any(), choice([char(";"), eof()])) \
    .map(lambda x: "".join(x)) \
    .map(lambda x: x.upper()) \
    .run(input2)  # Success(value='THIS IS A TEXT', next=' Comment')

Recursive parser

Recursive parsing allows parsing of infinitely nested structures such as JSON, JAML, or lists of lists. Parcomb has a special parser called "future" that allows us to define a parser, refer it, but define it at a later stage.

from parcomb.char import char
from parcomb.number import integer
from parcomb.parsing import future
from parcomb.combinator import between, sep_by, choice

input1 = "[1,[4,5],453,[4,[]]]"

# We create a future parser "elem" but we can't define it yet as it depend # on the "lst" parser, 
# that depends on the "elem" parser. E.g. we have a parser that depends on itself
elem = future()
lst = between(char("["), elem, char("]"))
elem.rebind(sep_by(choice([integer(), lst]), char(",")))

lst.run(input1)  # Success(value=[1, [4, 5], 453, [4, []]], next='')

Syntax DSL

The library contains an optional syntax that can make large expressions easier to read

from parcomb.char import any, char, spaces
from parcomb.number import integer
from parcomb.parsing import future

any() + any()  # Same as: combine(any(), any())

any() * any()  # Same as: product(any(), any())
any() * 5  # Same as: count(any(), 5)

integer() | char("a")  # Same choice([integer(), char("a")])

spaces() << integer() >> spaces()  # Same as skip_right(skip_left(spaces(), integer()),.spaces())

elem = future() 
elem <<= any()  # Same as elem.rebind(any())

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parcomb-0.13.0.tar.gz (19.9 kB view hashes)

Uploaded Source

Built Distribution

parcomb-0.13.0-py3-none-any.whl (11.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page