Skip to main content

A reverse-parser as a Hypotheses strategy: generate examples from an EBNF grammar

Project description

Hypothesis-Grammar

Build Status Latest PyPI version

Python 3.7 Python 3.8

(pre-alpha... the stuff I've tried all works, not well tested yet though)

What is it?

Hypothesis-Grammar is a "reverse parser" - given a grammar it will generate examples of that grammar.

It is implemented as a Hypothesis strategy.

(If you are looking to generate text from a grammar for purposes other than testing with Hypothesis then this lib can still be useful, but I stongly recommend looking at the tools provided with NLTK instead.)

Usage

So, how does this look?

First you need a grammar. Our grammar format is based on that used by the Lark parser library. You can see our grammar-parsing grammar here. More details of our grammar format below.

Here is an example of using Hypothesis-Grammar:

from hypothesis_grammar import strategy_from_grammar

st = strategy_from_grammar(
    grammar="""
        DET: "the" | "a"
        N: "man" | "park" | "dog"
        P: "in" | "with"

        s: np vp
        np: DET N
        pp: P np
        vp: "slept" | "saw" np | "walked" pp
    """,
    start="s",
)

st.example()
# ['a', 'dog', 'saw', 'the', 'man']

st.example()
# ['a', 'park', 'saw', 'a', 'man']

st.example()
# ['the', 'man', 'slept']

or as a test...

from hypothesis import given
from hypothesis_grammar import strategy_from_grammar


@given(
    strategy_from_grammar(
        grammar="""
            DET: "the" | "a"
            N: "man" | "park" | "dog"
            P: "in" | "with"

            s: np vp
            np: DET N
            pp: P np
            vp: "slept" | "saw" np | "walked" pp
        """,
        start="s",
    )
)
def test_grammar(example):
    nouns = {"man", "park", "dog"}
    assert any(noun in example for noun in nouns)

The grammar is taken from an example in the NLTK docs and converted into our "simplified Lark" format.

start="s" tells the parser that the start rule is s.

As you can see, we have produced a Hypothesis strategy which is able to generate examples which match the grammar (in this case, short sentences which sometimes makes sense).

The output will always be a flat list of token strings. If you want a sentence you can just " ".join(example).

But the grammar doesn't have to describe text, it might represent a sequence of actions for example. In that case you might want to convert your result tokens into object instances, which could be done via a lookup table.

(But if you're generating action sequences for tests then probably you should check out Hypothesis' stateful testing features first)

Grammar details

  • Whitespace is ignored
  • 'Terminals' must be named all-caps (terminals only reference literals, not other rules), e.g. DET
  • 'Rules' must be named all-lowercase, e.g. np
  • LHS (name) and RHS are separated by :
  • String literals must be quoted with double-quotes e.g. "man"
  • You can also use regex literals, they are delimited with forward-slash, e.g. /the[a-z]{0,2}/. Content for the regex token is generated using Hypothesis' from_regex strategy, with fullmatch=True.
  • Adjacent tokens are concatenated, i.e. DET N means a DET followed by a N.
  • | is alternation, so "in" | "with" means one-of "in" or "with"
  • ? means optional, i.e. "in"? means "in" is expected zero-or-one time.
  • * i.e. "in"* means "in" is expected zero-or-many times.
  • + i.e. "in"+ means "in" is expected one-or-many times.
  • ~ <num> means exactly-<num> times.
  • ~ <min>..<max> is a range, expected between-<min>-and-<max> times.
  • ( and ) are for grouping, the group can be quantified using any of the modifiers above.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hypothesis-grammar-0.1.1.tar.gz (8.0 kB view hashes)

Uploaded Source

Built Distribution

hypothesis_grammar-0.1.1-py3-none-any.whl (8.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page