Grammar library specialized for generating phrases (e.g., in fuzzing)
Project description
pygramm : Grammar processing in Python
Experiments in processing BNF with Python. Work in progress.
Why?
While Ply provides a semi-yaccalike, it has some characteristics that bother me. First, it prioritizes lexical processing by the length of the pattern, not the length of the token ... in violation of the "maximum munch" rule. Second, it tries to do everything at run time.
Also I want to experiment with generation of sentences as well as parsing, and with LL as well as LR parsing.
Work in progress
Done
- Parse BNF (
llparse.py
) and create an internal form. The BNF form is extended with Kleene *, but a grammar in pure BNF without Kleene is also fine. - Internal structure (
grammar.py
) represents the BNF structure directly. The Grammar object contains a list of symbols, each of which has a singleexpansion
(which could be a sequence or a choice). The following two grammars will produce precisely the same internal form:
andS ::= "a"; S ::= "b";
S ::= "a" | "b";
- A phrase generator (
generator.py
), together with some grammar analysis ingrammar.py
, can produce sentences within a given length limit (the budget) with or without direction. Seechoicebot.py
for an example of how grammar choices can be controlled.
To Do
- Distinguish lexical from CFG productions even for sentence generation because we will want different tactics for tokens than for RHS. In CFG we budget for length of sentence. In lexical productions we should choose between new and previously used tokens. Currently the BNF goes all the way to string constants, always. The works for the kinds of grammars that Glade learns, but it is not really ideal for generating useful program inputs.
- Related to the prior point: Infer a good boundary between CFG and lexical structure. In conventional grammar processing, a developer makes this distinction. For grammar learners like Glade, though, the distinction is not trivial to recognize.
- Add classic grammar analyses, starting with analyses for LL(1) grammars (first, follow), then checking for conflicts, and likewise for LALR(1) and/or LR(1).
- Add simple transformations, such as left-factoring for LL(1).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pygramm-0.0.2.tar.gz
(54.1 kB
view details)
Built Distribution
pygramm-0.0.2-py3-none-any.whl
(29.5 kB
view details)
File details
Details for the file pygramm-0.0.2.tar.gz
.
File metadata
- Download URL: pygramm-0.0.2.tar.gz
- Upload date:
- Size: 54.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 71c0319bfb193a7ee5c10366597200a34d6557a62c5fa6b90dd791900531c543 |
|
MD5 | 2c7770402f2bfa3f4a521dcad55c0513 |
|
BLAKE2b-256 | 84321080aa588f0a3b870e152a56aa7808d1f18531f2e88a0932c5578e399aac |
File details
Details for the file pygramm-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: pygramm-0.0.2-py3-none-any.whl
- Upload date:
- Size: 29.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ecf713032441053a3ef3b5185b61a373cc2fa05b66a92cdaa3bb705d8d18722b |
|
MD5 | 84d8a8048b98bb6ad5515e3eea6c2212 |
|
BLAKE2b-256 | 65e9b95a79118f2472a54a092032b47e7083cd7e43af8a568d44b7590410a546 |