A reverse-parser as a Hypotheses strategy: generate examples from an EBNF grammar
Project description
Hypothesis-Grammar
(pre-alpha... the stuff I've tried all works, not well tested yet though)
What is it?
Hypothesis-Grammar is a "reverse parser" - given a grammar it will generate examples of that grammar.
It is implemented as a Hypothesis strategy.
(If you are looking to generate text from a grammar for purposes other than testing with Hypothesis then this lib can still be useful, but I stongly recommend looking at the tools provided with NLTK instead.)
Usage
So, how does this look?
First you need a grammar. Our grammar format is based on that used by the Lark parser library. You can see our grammar-parsing grammar here. More details of our grammar format below.
Here is an example of using Hypothesis-Grammar:
from hypothesis_grammar import strategy_from_grammar
st = strategy_from_grammar(
grammar="""
DET: "the" | "a"
N: "man" | "park" | "dog"
P: "in" | "with"
s: np vp
np: DET N
pp: P np
vp: "slept" | "saw" np | "walked" pp
""",
start="s",
)
st.example()
# ['a', 'dog', 'saw', 'the', 'man']
st.example()
# ['a', 'park', 'saw', 'a', 'man']
st.example()
# ['the', 'man', 'slept']
or as a test...
from hypothesis import given
from hypothesis_grammar import strategy_from_grammar
@given(
strategy_from_grammar(
grammar="""
DET: "the" | "a"
N: "man" | "park" | "dog"
P: "in" | "with"
s: np vp
np: DET N
pp: P np
vp: "slept" | "saw" np | "walked" pp
""",
start="s",
)
)
def test_grammar(example):
nouns = {"man", "park", "dog"}
assert any(noun in example for noun in nouns)
The grammar is taken from an example in the NLTK docs and converted into our "simplified Lark" format.
start="s"
tells the parser that the start rule is s
.
As you can see, we have produced a Hypothesis strategy which is able to generate examples which match the grammar (in this case, short sentences which sometimes makes sense).
The output will always be a flat list of token strings. If you want a sentence you can just " ".join(example)
.
But the grammar doesn't have to describe text, it might represent a sequence of actions for example. In that case you might want to convert your result tokens into object instances, which could be done via a lookup table.
(But if you're generating action sequences for tests then probably you should check out Hypothesis' stateful testing features first)
Grammar details
- Whitespace is ignored
- 'Terminals' must be named all-caps (terminals only reference literals, not other rules), e.g.
DET
- 'Rules' must be named all-lowercase, e.g.
np
- LHS (name) and RHS are separated by
:
- String literals must be quoted with double-quotes e.g.
"man"
- You can also use regex literals, they are delimited with forward-slash, e.g.
/the[a-z]{0,2}/
. Content for the regex token is generated using Hypothesis'from_regex
strategy, withfullmatch=True
. - Adjacent tokens are concatenated, i.e.
DET N
means aDET
followed by aN
. |
is alternation, so"in" | "with"
means one-of"in"
or"with"
?
means optional, i.e."in"?
means"in"
is expected zero-or-one time.*
i.e."in"*
means"in"
is expected zero-or-many times.+
i.e."in"+
means"in"
is expected one-or-many times.~ <num>
means exactly-<num> times.~ <min>..<max>
is a range, expected between-<min>-and-<max> times.(
and)
are for grouping, the group can be quantified using any of the modifiers above.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for hypothesis_grammar-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1650dee38ba371d2e2e5f95c66eeb388c4d0d6345907570be662eba23239d7e3 |
|
MD5 | 1443f0abc0bfedc19859a063a21d2772 |
|
BLAKE2b-256 | cc808a934ce0aa939b8caf36522d36d665701178336792cfeb6b8ddddbf5d1bd |