Skip to main content

Grammar library specialized for generating phrases (e.g., in fuzzing)

Project description

pygramm : Grammar processing in Python

Experiments in processing BNF with Python. Work in progress.

Why?

While Ply provides a semi-yaccalike, it has some characteristics that bother me. First, it prioritizes lexical processing by the length of the pattern, not the length of the token ... in violation of the "maximum munch" rule. Second, it tries to do everything at run time.

Also I want to experiment with generation of sentences as well as parsing, and with LL as well as LR parsing.

Work in progress

Done

  • Parse BNF (llparse.py) and create an internal form. The BNF form is extended with Kleene *, but a grammar in pure BNF without Kleene is also fine.
  • Internal structure (grammar.py) represents the BNF structure directly. The Grammar object contains a list of symbols, each of which has a single expansion (which could be a sequence or a choice). The following two grammars will produce precisely the same internal form:
    S ::= "a";
    S ::= "b";
    
    and
    S ::= "a" | "b";
    
  • A phrase generator (generator.py), together with some grammar analysis in grammar.py, can produce sentences within a given length limit (the budget) with or without direction. See choicebot.py for an example of how grammar choices can be controlled.

To Do

  • Distinguish lexical from CFG productions even for sentence generation because we will want different tactics for tokens than for RHS. In CFG we budget for length of sentence. In lexical productions we should choose between new and previously used tokens. Currently the BNF goes all the way to string constants, always. The works for the kinds of grammars that Glade learns, but it is not really ideal for generating useful program inputs.
  • Related to the prior point: Infer a good boundary between CFG and lexical structure. In conventional grammar processing, a developer makes this distinction. For grammar learners like Glade, though, the distinction is not trivial to recognize.
  • Add classic grammar analyses, starting with analyses for LL(1) grammars (first, follow), then checking for conflicts, and likewise for LALR(1) and/or LR(1).
  • Add simple transformations, such as left-factoring for LL(1).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pygramm-0.0.2.tar.gz (54.1 kB view details)

Uploaded Source

Built Distribution

pygramm-0.0.2-py3-none-any.whl (29.5 kB view details)

Uploaded Python 3

File details

Details for the file pygramm-0.0.2.tar.gz.

File metadata

  • Download URL: pygramm-0.0.2.tar.gz
  • Upload date:
  • Size: 54.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for pygramm-0.0.2.tar.gz
Algorithm Hash digest
SHA256 71c0319bfb193a7ee5c10366597200a34d6557a62c5fa6b90dd791900531c543
MD5 2c7770402f2bfa3f4a521dcad55c0513
BLAKE2b-256 84321080aa588f0a3b870e152a56aa7808d1f18531f2e88a0932c5578e399aac

See more details on using hashes here.

File details

Details for the file pygramm-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: pygramm-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 29.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for pygramm-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ecf713032441053a3ef3b5185b61a373cc2fa05b66a92cdaa3bb705d8d18722b
MD5 84d8a8048b98bb6ad5515e3eea6c2212
BLAKE2b-256 65e9b95a79118f2472a54a092032b47e7083cd7e43af8a568d44b7590410a546

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page