Skip to main content

ReBNF: Regexes for Extended Backus-Naur Form (EBNF)

Project description

ReBNF

ReBNF (Regexes for Extended Backus-Naur Form) is a notation used to define the syntax of a language using regular expressions.

It is an extension of the EBNF (Extended Backus-Naur Form) notation, allowing for more flexibility and ease of use.

ooooooooo.             oooooooooo.  ooooo      ooo oooooooooooo 
`888   `Y88.           `888'   `Y8b `888b.     `8' `888'     `8 
 888   .d88'  .ooooo.   888     888  8 `88b.    8   888         
 888ooo88P'  d88' `88b  888oooo888'  8   `88b.  8   888oooo8    
 888`88b.    888ooo888  888    `88b  8     `88b.8   888    "    
 888  `88b.  888    .o  888    .88P  8       `888   888         
o888o  o888o `Y8bod8P' o888bood8P'  o8o        `8  o888o       

Table of Contents

Syntax

The ReBNF notation uses regular expressions to define the structure of a language. Each rule consists of a left-hand side (non-terminal) and a right-hand side separated by an assignment operator (either ::=, := or =).

The general syntax of a ReBNF rule is as follows:

<alnum> ::= r"[a-zA-Z0-9]" ; # any alphanumeric character

The alphanumeric set is composed of all letters and all digits, which sums up 36 characters.

The EBNF syntax requires quotes and | operators in between characters to define the alnum identifier as matching any alphanumeric character, which sums up to 143 characters.

Using ReBNF, a single regex is required such as r"[a-zA-Z0-9]", which sums up to 14 characters.

Identifiers

The enclosures < and > are optional, such as:

alnum = r"[a-zA-Z0-9]"       # shorter definition

To improve readability and consistency, spaces are removed from identifiers, and the snake_case naming convention is used instead.

Snake case identifiers consist of lowercase letters, digits, and underscores.

The naming convention also dictates that each word within an identifier is separated by an underscore.

This convention makes a clear distinction between individual words and ensures that identifiers are easily recognizable.

For example, an identifier non-terminal symbol would have to be written as non_terminal_symbol.

By adhering to the snake case convention, ReBNF identifiers maintain a standardized and consistent style throughout the notation, enabling easier comprehension and usage.

Modularity

In ReBNF, import statements are used to bring in grammar rules defined in separate specification files. This enables the reuse of existing rules and promotes modular design in grammar specifications.

As a result, we can organize grammar rules into separate .rebnf specification files, making it easier to manage and maintain complex grammars. This allows for better code organization, reuse of common rules, and separation of concerns.

To import rules from another specification file, we can use the import statement followed by the dotted path to a specification file or the from statement to import only specific items. This enables us to selectively use and reference rules defined in other files.

Given a folder hierarchy such as:

grammar/
├── common.rebnf
└── spec.rebnf

Here's an example:

from common import *

Using modularity in ReBNF files can lead to more maintainable and scalable grammar specifications.

Optional groups

Square brackets [ ] are used to define optional groups rather than repetition. In EBNF, 3 * [aa] would indicate the generation of multiple occurrences of aa (e.g., A, AA, AAA), whereas in ReBNF, it denotes an optional group that can occur zero or one times.

In EBNF:

aa = "A";
bb = 3 * aa, "B";
cc = 3 * [aa], "C";

Which means:

  • aa: A
  • bb: AAAB
  • cc: C, AC, AAC, AAAC

In ReBNF:

aa = "A";
bb = 3 * aa "B";
cc = 3 * [aa] "C";

Which means:

  • aa: A
  • bb: AAAB
  • cc: AAAC

Concatenation

ReBNF also introduces a change in concatenation.

In EBNF, explicit concatenation is required using a comma , between two identifiers.

However, in ReBNF, since snake cased identifiers are enforced, concatenation is implicit. Adjacent terminals or identifiers are concatenated.

That's why we are able to drop the comma in 3 * aa, "C" if we want cc to be "AAAC".

Example

Here's a short example of a ReBNF definition for a simple arithmetic expression language:

expression = term { ('+' | '-') term }
term = factor { ('*' | '/') factor }
factor = number | expression
number = r'\d+'

Usage

ReBNF notation is used to define the syntax of programming languages, configuration file formats, or any other formal language.

It provides a concise and powerful way to express language structures with a addition of regular expressions.

Note that the functions in this module are only designed to parse syntactically valid ReBNF code (code that does not raise when parsed using parse()). The behavior of the functions in this module is undefined when providing invalid ReBNF code and it can change at any point.

Contributing

Contributions are welcome! If you have suggestions, improvements, or new ideas related to the ReBNF notation, please feel free to open an issue or submit a pull request.

License

This project is licensed under the GPLv3 license - see LICENSE.md for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rebnf-0.7.tar.gz (30.8 kB view details)

Uploaded Source

Built Distribution

rebnf-0.7-py2.py3-none-any.whl (30.9 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file rebnf-0.7.tar.gz.

File metadata

  • Download URL: rebnf-0.7.tar.gz
  • Upload date:
  • Size: 30.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for rebnf-0.7.tar.gz
Algorithm Hash digest
SHA256 3989363a114793b2176841a8891d261ad795192ae1ccc4a9524c10db11fe7db3
MD5 6e1025a98af43790e5d0ecd9d4fb6e66
BLAKE2b-256 dcc877d622f6a04608b26af8255afb5b461281cb2cbad2bf0d6a5f9ba29708da

See more details on using hashes here.

File details

Details for the file rebnf-0.7-py2.py3-none-any.whl.

File metadata

  • Download URL: rebnf-0.7-py2.py3-none-any.whl
  • Upload date:
  • Size: 30.9 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for rebnf-0.7-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 e51ff60e67355dd3deefef734c5345c9da5252a22717dbb83cc54d4e02942a84
MD5 08787ed1b224e5f7c5bf5626cb55a2a2
BLAKE2b-256 d962a76c07ffd1221fc98a3cdd50fc944e2b0a140604976a59eafc24dc51ef7d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page