Skip to main content

Conversion of a (sub) grammar to (approximated) regular expressions.

Project description

Grammar2Regex

Grammar2Regex converts context-free grammars to regular expressions. Left- and right-linear grammars are always converted precisely. Furthermore, some grammars with a regular language but a more general shape can also be converted. For example, a rule <N> ::= a<A><N> is not right-linear due to the occurrence of <A>; but if <A> does not reach <N> and everything else is regular, we obtain a precise regular expression by computing the regular expression for <A> and inserting it into the rule for <N>.

For grammars with a non-regular language or a suboptimal shape, we "unwind" problematic expansion elements. This step loses information, but enables us to compute under-approximating regular expressions for arbitrary context-free grammars. Grammar2Regex returns expressions either in a custom ADT format suitable for further processing or as ReRef objects of the z3 SMT solver.

Example

Consider the following grammar of XML attribute or tag name identifiers:

import string
from typing import Dict, List
from fuzzingbook.Grammars import srange

xml_id_grammar: Dict[str, List[str]] = {
    "<start>": ["<id>"],
    "<id>": [
        "<id-no-prefix>",
        "<id-with-prefix>"
    ],
    "<id-no-prefix>": [
        "<id-start-char>",
        "<id-start-char><id-chars>",
    ],
    "<id-with-prefix>": ["<id-no-prefix>:<id-no-prefix>"],
    "<id-start-char>": srange("_" + string.ascii_letters),
    "<id-chars>": ["<id-char>", "<id-char><id-chars>"],
    "<id-char>": ["<id-start-char>"] + srange("-." + string.digits),
}

Our chosen grammar format is the one from the Fuzzing Book.

This grammar can be precisely converted into a regular expression:

from grammar_to_regex.cfg2regex import RegexConverter

converter = RegexConverter(xml_id_grammar, compress_unions=True)
print(converter.to_regex('<start>', convert_to_z3=False))

which yields an object of type grammar_to_regex.regex.Regex, in pretty print:

((["-" .. "."] | ["0" .. "9"] | ["A" .. "Z"] | "_" | ["a" .. "z"])* • (["-" .. "."] | ["0" .. "9"] | ["A" .. "Z"] | "_" | ["a" .. "z"]))

You can also obtain an output in the regular expression format of the z3 SMT solver if you need this for further processing (simply omit the convert_to_z3=False or set that parameter to True).

Installation

Grammar2Regex requires Python 3.10.

To install Grammar2Regex, run

pip install grammar_to_regex

To install it locally for testing, check this repository out and run

pip install -e .[test]

License and Maintainer

This project is licensed under the GNU GENERAL PUBLIC LICENSE v3. It is maintained by Dominic Steinhöfel.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

grammar_to_regex-1.0.1.tar.gz (54.4 kB view details)

Uploaded Source

Built Distribution

grammar_to_regex-1.0.1-py3-none-any.whl (38.0 kB view details)

Uploaded Python 3

File details

Details for the file grammar_to_regex-1.0.1.tar.gz.

File metadata

  • Download URL: grammar_to_regex-1.0.1.tar.gz
  • Upload date:
  • Size: 54.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.14

File hashes

Hashes for grammar_to_regex-1.0.1.tar.gz
Algorithm Hash digest
SHA256 5894c44c5a1594324edecd825997b001d59154c94fb584bf23aaea37e8834f79
MD5 aae62390adb0e7d6ef45d016239f441d
BLAKE2b-256 6387bfbb7a15cd4099a3796ce320dc3c254ace1042504f770890b293980468d7

See more details on using hashes here.

File details

Details for the file grammar_to_regex-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for grammar_to_regex-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5c0cdbe47581d69567f9ecf0f6198eda34c47f3383504ca1778bfddf50c196d7
MD5 39eef860ac0670a3f926c16c59692367
BLAKE2b-256 78c8bf3e49e8ef17e7662301ced74ae8f9eef344c2c3b8a97f4344c62f58223e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page