Skip to main content

A reverse-parser as a Hypotheses strategy: generate examples from an EBNF grammar

Project description

Hypothesis-Grammar

Build Status Latest PyPI version

Python 3.7 Python 3.8

(pre-alpha... the stuff I've tried all works, not well tested yet though)

What is it?

Hypothesis-Grammar is a "reverse parser" - given a grammar it will generate examples of that grammar.

It is implemented as a Hypothesis strategy.

(If you are looking to generate text from a grammar for purposes other than testing with Hypothesis then this lib can still be useful, but I stongly recommend looking at the tools provided with NLTK instead.)

Usage

So, how does this look?

First you need a grammar. Our grammar format is based on that used by the Lark parser library. You can see our grammar-parsing grammar here. More details of our grammar format below.

Here is an example of using Hypothesis-Grammar:

from hypothesis_grammar import strategy_from_grammar

st = strategy_from_grammar(
    grammar="""
        DET: "the" | "a"
        N: "man" | "park" | "dog"
        P: "in" | "with"

        s: np vp
        np: DET N
        pp: P np
        vp: "slept" | "saw" np | "walked" pp
    """,
    start="s",
)

st.example()
# ['a', 'dog', 'saw', 'the', 'man']

st.example()
# ['a', 'park', 'saw', 'a', 'man']

st.example()
# ['the', 'man', 'slept']

or as a test...

from hypothesis import given
from hypothesis_grammar import strategy_from_grammar


@given(
    strategy_from_grammar(
        grammar="""
            DET: "the" | "a"
            N: "man" | "park" | "dog"
            P: "in" | "with"

            s: np vp
            np: DET N
            pp: P np
            vp: "slept" | "saw" np | "walked" pp
        """,
        start="s",
    )
)
def test_grammar(example):
    nouns = {"man", "park", "dog"}
    assert any(noun in example for noun in nouns)

The grammar is taken from an example in the NLTK docs and converted into our "simplified Lark" format.

start="s" tells the parser that the start rule is s.

As you can see, we have produced a Hypothesis strategy which is able to generate examples which match the grammar (in this case, short sentences which sometimes makes sense).

The output will always be a flat list of token strings. If you want a sentence you can just " ".join(example).

But the grammar doesn't have to describe text, it might represent a sequence of actions for example. In that case you might want to convert your result tokens into object instances, which could be done via a lookup table.

(But if you're generating action sequences for tests then probably you should check out Hypothesis' stateful testing features first)

Grammar details

  • Whitespace is ignored
  • 'Terminals' must be named all-caps (terminals only reference literals, not other rules), e.g. DET
  • 'Rules' must be named all-lowercase, e.g. np
  • LHS (name) and RHS are separated by :
  • String literals must be quoted with double-quotes e.g. "man"
  • You can also use regex literals, they are delimited with forward-slash, e.g. /the[a-z]{0,2}/. Content for the regex token is generated using Hypothesis' from_regex strategy, with fullmatch=True.
  • Adjacent tokens are concatenated, i.e. DET N means a DET followed by a N.
  • | is alternation, so "in" | "with" means one-of "in" or "with"
  • ? means optional, i.e. "in"? means "in" is expected zero-or-one time.
  • * i.e. "in"* means "in" is expected zero-or-many times.
  • + i.e. "in"+ means "in" is expected one-or-many times.
  • ~ <num> means exactly-<num> times.
  • ~ <min>..<max> is a range, expected between-<min>-and-<max> times.
  • ( and ) are for grouping, the group can be quantified using any of the modifiers above.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hypothesis-grammar-0.1.1.tar.gz (8.0 kB view details)

Uploaded Source

Built Distribution

hypothesis_grammar-0.1.1-py3-none-any.whl (8.4 kB view details)

Uploaded Python 3

File details

Details for the file hypothesis-grammar-0.1.1.tar.gz.

File metadata

  • Download URL: hypothesis-grammar-0.1.1.tar.gz
  • Upload date:
  • Size: 8.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.3 CPython/3.7.6 Darwin/18.7.0

File hashes

Hashes for hypothesis-grammar-0.1.1.tar.gz
Algorithm Hash digest
SHA256 75cf5d73df5d3ed468524622de12f2a8ae07aa0231266ab9ba52eb81a7753429
MD5 cb1a6cbb33c6c2ae4197b35a1c7d9442
BLAKE2b-256 05966cb3a356499a79ec9b4b630377d99a9874642a2abe8c331a5013698ab406

See more details on using hashes here.

File details

Details for the file hypothesis_grammar-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for hypothesis_grammar-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1650dee38ba371d2e2e5f95c66eeb388c4d0d6345907570be662eba23239d7e3
MD5 1443f0abc0bfedc19859a063a21d2772
BLAKE2b-256 cc808a934ce0aa939b8caf36522d36d665701178336792cfeb6b8ddddbf5d1bd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page