Skip to main content

Parser combinators library

Project description

Parcomb - Perser combinator library

Build Status PyPI version Coverage License: MIT Python Versions

Parcomb is a library for writing arbitrary text parsers and interpreters using regular python code. Technically, it's a top down back-tracing parser using parser combinators. It's heavily influences by the Parsec library

Installation

pip install parcomb

Usage

from typing import Tuple
from parcomb.char import char, trim
from parcomb.combinator import many, choice, between
from parcomb.number import integer
from parcomb.parsing import future

input1 = "(1 + 4 * 6) + 5 + (6 + (10 + 11)) + 5"

def eval(x: int, xs: list[Tuple[str, int]]) -> int:
    if not xs:
        return x

    current = xs[0]
    next = xs[1:]

    fdict = {
        "+": lambda a, b: eval(x + a, next),
        "-": lambda a, b: eval(x - a, next),
        "*": lambda a, b: eval(x * a, next),
        "/": lambda a, b: eval(x / a, next),
    }

    return fdict[current[0]](current[1], next)

op_prio1 = [trim(char(x)) for x in ["*", "/"]]
op_prio2 = [trim(char(x)) for x in ["+", "-"]]

expr = future()
factor = trim(integer()) | between(char("("), expr, char(")"))
term = (factor * many(choice(*op_prio1) * factor)).map_u(eval)
expr <<= (term * many(choice(*op_prio2) * term)).map_u(eval)

expr.run(input1)  # Success(value=62, next='')

More examples

Foundation

A parser is a function string -> (A, string) that reads zero or more characters from a string. It then optionally transforms what it read and return it as a tuple together with the part it didn't read.

  • Example 1: The string "abc" are applied to the any parser. It reads the "a" character and returns ("a", "bc")
  • Example 2: The string "12ab" are applied to the integer parser. It reads "12", transforms it to an integer, and returns (12, "ab")

The parser can also return a failure.

Multiple parsers can be combined to create new, more complex, parsers. Finally, a parser is evaluated by calling the run method on it

Value parsers

A value parser reads character(s) (input) and produces values (output). These parsers are the building blocks for more complex parsers. Parcomb contain many built in value parsers. They are located in submodules that corresponding to the type they produce. Character parsers are in parcomb.char, number parsers in parcomb.number, string parsers in parcomb.string, and so forth.

from parcomb.char import any, char, none_of

input = "test string"

# Reads (consumes) first character in the input text and sets it as output
any().run(input) # Success(value='t', next='est string')

# Attempts to read an "a" from input text but fails. Does not consume any characters
char("a").run(input)  # Failure(message='#char: Failed to find [a]. value: [t], ...', next='test string')

# Reads (consumes) any character as long as it is not a " " or a "a". 
none_of([" ", "a"]).run(input)  # Success(value='t', next='est string')

For more information, see implementation of the any and char parsers

Combinator parsers

Value parsers reads single values out of a text, but they are rarely useful by themselves. Instead, they serves as building blocks for combinator parsers. These parsers combine multiple parsers into more complex once.

from parcomb.number import integer
from parcomb.char import char, any, none_of
from parcomb.combinator import sep_by, combine, combine_f, many, product3

input1 = "2,3,5,7,11 Prime numbers"
input2 = "123,456"

# Parse zero or more integers, separated by ","
sep_by(integer(), char(",")).run(input1)  # Success(value=[2, 3, 5, 7, 11], next=' # Prime numbers')

# Combine any two characters using the build in "+" operator or custom function
combine(any(), any()).run(input1)  # Success(value='2,', next='3,5,7,11 # Prime numbers')
combine_f(any(), any(), lambda a, b: b + a).run(input1)  # Success(value=',2', next='3,5,7,11 # Prime numbers')

# Consume many non " " characters. The many parser continues to parse until its first failure
many(none_of([" "])).run(input1)  # Success(value=['2', ',', '3', ',', '5', .. ], next=' # Prime numbers')

# ProductN combines n parsers into a tuple
product3(integer(), char(","), integer()).run(input2)  # Success(value=(123, ',', 456), next='')

The library contains many useful parser combinators such as many, many1, choice, end_by, peek, and product

Ignoring data

Parsers often reads characters that should not be in the final output structure. Examples of this is:

  • Whitespace, such as new line characters or spaces
  • Characters that are used to define structure (such as "," in a csv document)
  • Comments to humans that have no impact on the data

The library provides two methods for ignoring data skip_left and skip_right. They are both parser combinators that takes two parsers as arguments and ignores one of them.

from parcomb.number import integer
from parcomb.char import char, spaces
from parcomb.combinator import sep_by, many, skip_left

input1 = "   2, 3,  5, 7,   11"

# Ignores 0 or more spaces in front of a number
nr = skip_left(spaces(), integer())
sep_by(nr, char(",")).run(input1)  # Success(value=[2, 3, 5, 7, 11], next='')

Transforming data

Every parser contains a transformation function called map and a sister function called map_u. The purpose of these functions are to convert a Parser[A] to a Parser[B] given a function A -> B. Very similar to how the map function converts a List[A] to a List[B]. The difference is that the map_u function first unpacks a tuple before applying it to the transformation function. This simplifies the usage with the product parser

from parcomb.number import integer
from parcomb.char import char, eof
from parcomb.combinator import product3, sep_by, end_by, choice

input1 = "2,3,5,7,11"
input2 = "This is a text; Comment"

# Create a tuple of "2", ",", 3 and then multiple the numbers
product3(integer(), char(","), integer()) \
    .map(lambda x: x[0] * x[2]).run(input1)  # Success(value=6, next=',5,7,11')
    # .map_u(lambda l, _, r: l * r).run(input)  # map_u unpacks a tuple to function parameters

# Create a list of the first 5 prime numbers and then sum them together
sep_by(integer(), char(",")).map(sum).run(input1)  # Success(value=28, next='')

# Read input, character by character, until we either get a ';' char or end of file. 
# transformation 1: Join the list of character into a string
# transformation 2: Convert all characters to upper case
end_by(any(), choice([char(";"), eof()])) \
    .map(lambda x: "".join(x)) \
    .map(lambda x: x.upper()) \
    .run(input2)  # Success(value='THIS IS A TEXT', next=' Comment')

Recursive parser

Recursive parsing allows parsing of infinitely nested structures such as JSON, JAML, or lists of lists. Parcomb has a special parser called "future" that allows us to define a parser, refer it, but define it at a later stage.

from parcomb.char import char
from parcomb.number import integer
from parcomb.parsing import future
from parcomb.combinator import between, sep_by, choice

input1 = "[1,[4,5],453,[4,[]]]"

# We create a future parser "elem" but we can't define it yet as it depend # on the "lst" parser, 
# that depends on the "elem" parser. E.g. we have a parser that depends on itself
elem = future()
lst = between(char("["), elem, char("]"))
elem.rebind(sep_by(choice([integer(), lst]), char(",")))

lst.run(input1)  # Success(value=[1, [4, 5], 453, [4, []]], next='')

Syntax DSL

The library contains an optional syntax that can make large expressions easier to read

from parcomb.char import any, char, spaces
from parcomb.number import integer
from parcomb.parsing import future

any() + any()  # Same as: combine(any(), any())

any() * any()  # Same as: product(any(), any())
any() * 5  # Same as: count(any(), 5)

integer() | char("a")  # Same choice([integer(), char("a")])

spaces() << integer() >> spaces()  # Same as skip_right(skip_left(spaces(), integer()),.spaces())

elem = future() 
elem <<= any()  # Same as elem.rebind(any())

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parcomb-0.12.0.tar.gz (18.1 kB view details)

Uploaded Source

Built Distribution

parcomb-0.12.0-py3-none-any.whl (10.8 kB view details)

Uploaded Python 3

File details

Details for the file parcomb-0.12.0.tar.gz.

File metadata

  • Download URL: parcomb-0.12.0.tar.gz
  • Upload date:
  • Size: 18.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.28.2

File hashes

Hashes for parcomb-0.12.0.tar.gz
Algorithm Hash digest
SHA256 306c47af2b2ccbce1e2fd3b79e65eb6cbc812a3c1446f4a40bb945994cf7506d
MD5 96f5b56a47c504f3645a926db7d5ef0b
BLAKE2b-256 c07bc59b3b80fadae385ed449bbd0e6b70a8f6aeb88184d9b418119b6d064606

See more details on using hashes here.

File details

Details for the file parcomb-0.12.0-py3-none-any.whl.

File metadata

  • Download URL: parcomb-0.12.0-py3-none-any.whl
  • Upload date:
  • Size: 10.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.28.2

File hashes

Hashes for parcomb-0.12.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f87cd1c0f641f6efc7d69923ebb4a989b5d37d2aa492bad85edb7284f56f44ee
MD5 61ab779e9473456dcc0affaad59c8585
BLAKE2b-256 668b1077fe3cd2ac19f800e9c203b08e842d83f4ee6eee88550eed188fe59d77

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page