Skip to main content

Simple parser combinator made in Python

Project description

Pyrsec

Simple parser combinator made in Python

PyPI PyPI - License codecov

In the journey of creating a parser combinator in python while being as type safe as possible we are here now. I don't recommend you use this for anything important but for exploration and fun. This library is a mostly undocumented, bare bone implementation of a parser combinator, no error recovery is currently in place, only None is returned in case the parser can't continue. I basically started with a minimum implementation while adding a basic json parser as a test and kept adding functionality as needed.

pip install pyrsec

A Json parser as an example

You should be able to inspect the types of the variables in the following code

>>> from pyrsec import Parsec

Lets define the type of our json values,

>>> from typing import Union, List, Dict  # because 3.8 and 3.9 🙄
>>> # Recursive type alias 👀. See how we will not parse `floats` here.
>>> # Also at this level we can't still reference JSON recursively, idk why.
>>> JSON = Union[bool, int, None, str, List["JSON"], Dict[str, "JSON"]]

and the type of our parser. Since this is a parser that will output JSON values its type will be Parsec[JSON].

>>> # To be defined later
>>> json_: Parsec[JSON]
>>> # For recursive parsers like `list_` and `dict_`
>>> deferred_json_ = Parsec.from_deferred(lambda: json_)

Lets bring up a few basic parsers.

>>> import re
>>> true = Parsec.from_string("true").map(lambda _: True)
>>> false = Parsec.from_string("false").map(lambda _: False)
>>> null = Parsec.from_string("null").map(lambda _: None)
>>> number = Parsec.from_re(re.compile(r"-?\d+")).map(int)
>>> true("true")
(True, '')
>>> false("false")
(False, '')
>>> null("null")
(None, '')
>>> number("42")
(42, '')

We need to be able to parse character sequences, lets keep it simple.

The operators >> and << are used to discard the part that the arrow is not pointing at. They are meant to work well with Parsec instances. In this case only the result of the middle parser Parsec.from_re(re.compile(r"[^\"]*")) is returned from the string parser.

If what you want instead is to concatenate the results you should see the & operator. (wait for the pair definition).

>>> quote = Parsec.from_string('"').ignore()
>>> string = quote >> Parsec.from_re(re.compile(r"[^\"]*")) << quote
>>> string('"foo"')
('foo', '')

See how the quotes got discarded?

Also, missing a quote would mean a parsing error.

>>> string('foo"'), string('"bar')
(None, None)

Lets get a little bit more serious with the lists.

Spaces are always optional on json strings. Other basic tokens are also needed.

>>> space = Parsec.from_re(re.compile(r"\s*")).ignore()
>>> comma = Parsec.from_string(",").ignore()
>>> opened_square_bracket = Parsec.from_string("[").ignore()
>>> closed_square_bracket = Parsec.from_string("]").ignore()

And finally, the list parser. We need to use a deferred value here because the definition is recursive but the whole json parser is still not available.

>>> list_ = (
...     opened_square_bracket
...     >> (deferred_json_.sep_by(comma))  # See here?
...     << closed_square_bracket
... )

Lets create an incomplete one.

>>> json_ = space >> (true | false | number | null | string | list_) << space

Lets try it then!

>>> list_("[]")
([], '')
>>> list_("[1, true, false, []]")
([1, True, False, []], '')

Defining a dict should be pretty easy by now. Maybe the pair parser is interesting because its use of &.

Some tokens,

>>> opened_bracket = Parsec.from_string("{").ignore()
>>> closed_bracket = Parsec.from_string("}").ignore()
>>> colon = Parsec.from_string(":").ignore()

And pair, notice that the type of pair will be Parsec[tuple[str, JSON]].

>>> pair = ((space >> string << space) << colon) & deferred_json_
>>> pair('"foo": [123]')
(('foo', [123]), '')

The dict parser will finally be pretty close to the list one.

>>> dict_ = (
...     opened_bracket
...     >> pair.sep_by(comma).map(lambda xs: dict(xs))
...     << closed_bracket
... )

And finally lets redefine the json parser to embrace the full beauty of it.

>>> json_ = space >> (true | false | number | null | string | list_ | dict_) << space
>>> json_("""
... {
...     "json_parser": [true]
... }
... """)
({'json_parser': [True]}, '')

Enjoy!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyrsec-0.2.4.tar.gz (5.0 kB view details)

Uploaded Source

Built Distribution

pyrsec-0.2.4-py3-none-any.whl (4.7 kB view details)

Uploaded Python 3

File details

Details for the file pyrsec-0.2.4.tar.gz.

File metadata

  • Download URL: pyrsec-0.2.4.tar.gz
  • Upload date:
  • Size: 5.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.2

File hashes

Hashes for pyrsec-0.2.4.tar.gz
Algorithm Hash digest
SHA256 7bc306673ed7bc60df04186c67a955ed0b1a96b7036c94cf91ebc5841b327d77
MD5 e78785325f2860eaab1085a40fb75d9f
BLAKE2b-256 ae377bd87effbb47750af70c2f6825ec293e9179e86198fda57b31eabd0526e7

See more details on using hashes here.

File details

Details for the file pyrsec-0.2.4-py3-none-any.whl.

File metadata

  • Download URL: pyrsec-0.2.4-py3-none-any.whl
  • Upload date:
  • Size: 4.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.2

File hashes

Hashes for pyrsec-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 e7720dff848bcbc34ff6476fc5b92cffb837021cadc78e5ac54cdd2b9f4bac8c
MD5 f57267a4a6d773ddd9bbd97cc567bb47
BLAKE2b-256 299e4858e401e8cacd9ff6142e52ab068885ed1b8c5f0fb9f694fd3f4126c4da

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page