Skip to main content

Grammar compilation for Caustic

Project description

Caustic's lexing/grammar framework

The basic_compiler module is a less advanced compiler, but is used to bootstrap the Compiler

The Compiler class compiles grammars from Caustic grammar (.cag) files into nodes, and uses a grammer system built in Caustic grammar format and compiled with the basic_compiler module

The Compiler is loaded through the load_compiler() function in the package, and can be cached to the disk using the save_compiler() function

The nodes module provides the nodes themselves, and allows manually building grammar by supplying nodes

The serialize module provides functions for serializing and deserializing nodes

The util module provides small utilities

The .cag specification

Pragmas

Pragmas are special directives embedded in the grammar
These are only supported on the bootstrapped compile module

Include

$include [path]

Allows putting multiple grammar files together

Relative paths provided as [path] will be checked against the following directories, in order:

  • The path of the includer/importer (if possible)
  • The builtin_path of the compiler module (the location of compiler.py)
  • The current directory

Comments

Comments may start with a #

Statements

A statement begins with an identifier, followed by an =, then an expression, and finally a ;

Identifier

An identifier is a sequence of alphanumeric characters, underscores, and periods

Note: basic_compiler will not accept identifiers with periods

Expression

Expressions consist of nodes, where a node can be as simple as a string to as complex as a group

Naming

nodes.Node.name

Named nodes are denoted by a name (alphanumeric, underscores, and periods), followed by a :, and then the node/expression
This controls the return value of containing groups

Note: basic_compiler will not accept node names with periods

Anonymous

"Anonymous" named nodes are expressions prefixed with :, but with no leading name

Unpack

"Unpack" nodes are expressions prefixed with ^:

Note: basic_compiler will not accept unpack nodes

Group

nodes.NodeGroup

The top level of an expression is implicitly grouped

A simple group node is opened by ( and closed by )
Groups match the nodes inside of them in a sequence in order
The return value of this group will be dependent on its contents' naming:

  • A group containing no named nodes will return a list of its nodes' results
  • A group containing nodes with "anonymous" names returns the last matched anonymous nodes' return value
  • A group containing named nodes returns a dict containing a mapping of the names to the nodes' results
  • Any unpack nodes will unpack either their elements (sequence) or their names and values into the surrounding group's result

Mixing anonymous and named expressions in a single group will result in an error

Whitespace sensitive group

nodes.NodeGroup, keep_whitespace=True

A whitespace sensitive group is opened by { and closed by }
The only difference between this type of group and a normal group is that it does not implicitly discard whitespace between its nodes

Union

nodes.UnionNode

A union is opened by [ and closed by ]
Unions match any of their contained nodes

Range

nodes.NodeRange

Can be created in the following ways:

  • - [node]: Matches any amount of [node]
  • x- [node]: Matches x or more of [node]
  • -x [node]: Matches up to (but not including) x of [node]
  • a-b [node]: Matches between a (inclusive) and b (exclusive) of [node]

Note that this should be placed after a (name)[#naming]

Real

Real nodes are nodes that actually match content, such as strings or patterns

String

nodes.StringNode

The simplest node, denoted either by single quotes ('') or double quotes ("")
Supports escape characters

Note: despite the name of this node, it is important to remember that the nodes only match bytes!

Pattern

nodes.PatternNode

Matches a regular expression, denoted by slashes (/) in the following syntax:

target group / pattern / flags

Target Group

In a pattern, if a target group is given (as an integer), the result of this node will be the bytes of that group instead of the entire match

Flags

Supports these common RegEx flags:

  • i: ignore case / case insensitive
  • m: multiline - ^ matches beginning of line or string, $ matches end of either
  • s: single-line / "dotall" - . matches newlines as well

Meta

"Meta" nodes that don't actually match anything, but can change some context

Stealer

nodes.Stealer

A "stealer" node is denoted by a !, and is only acceptable in a group

If a group reaches a "stealer" node, then the group will raise an exception if any of the subsequent nodes fail

Context

nodes.Context

A context is created with an opening < and closing >
Context nodes always mach, with the result being the (string) contents

Context nodes should contain either a string, or a short sequence of alphanumeric characters and underscores

Node Reference

nodes.NodeRef

Denoted by an @, followed by a node name (as a string of alphanumeric characters, underscores, and periods)

Matches the value of the targeted node, and returns the result of that

Must be bound using either its .bind() method, or automatically through the default compilers

Note: basic_compiler will not accept node references with periods

Lookahead

nodes.Lookahead

Denoted by a & (positive lookahead) or &! (negative lookahead), this node will match its target node, but will not consume any of the buffer

If the lookahead is negative, then it will return True if its node fails to match, otherwise failing to match

Changelog

0.2.0

  • Implemented node saving and loading through the serialize module
  • Moved compiler.bind_nodes() to util.bind_nodes()

1.0.0

  • Completely reworked compiler caching
  • Removed $import pragma
  • Moved WHITESPACE_PATT to .util
  • Changed nodes.Node.NO_RETURN to singleton(ish) util.NO_MATCH

1.0.0-1

  • Fixed an inaccuracy in README

1.0.1

  • Added builtin grammar.cag to package
  • Added precompiled precompiled_nodes.pkl to package

1.0.2

  • Fixed error causted by compiler.py Compiler.compile_buffermatcher() passing unneeded kwarg to .pre_process()
  • Made NodeSyntaxError self-formatting also include exception notes

1.1.0

  • Added support for periods in node names
  • Fixed Compiler.post_process_compile() not actually doing anything

1.2.0

1.2.1

  • Fixed several nodes improperly stripping whitespace

1.2.2

  • Fixed unpacking never triggering
  • Fixed NodeRanges raising exceptions upon backtracking

1.3.0

  • Implemented lookaheads

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

caustic.lexer-1.3.0.tar.gz (21.1 kB view details)

Uploaded Source

Built Distribution

caustic.lexer-1.3.0-py3-none-any.whl (20.8 kB view details)

Uploaded Python 3

File details

Details for the file caustic.lexer-1.3.0.tar.gz.

File metadata

  • Download URL: caustic.lexer-1.3.0.tar.gz
  • Upload date:
  • Size: 21.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.8

File hashes

Hashes for caustic.lexer-1.3.0.tar.gz
Algorithm Hash digest
SHA256 333343000bfc11b5a8cc2b3ff0c7b93c6795f06a3b50b078b1c8df788810cc95
MD5 d01926a010d0bc824dcd7bf40c98d019
BLAKE2b-256 6f82e26bcb68b969ad0994d630ac43bfbb95344f629298b3595d37511e490bdd

See more details on using hashes here.

File details

Details for the file caustic.lexer-1.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for caustic.lexer-1.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0f61724a0dd7b3f576467c7c506bf0860da81ec74025940597b88be10f8fa71f
MD5 afb3efe1c65fedd2e41f10b787796633
BLAKE2b-256 ff9224beaa51fa161b74d80def69c5d207a7244e61d479cf57f5d8c70ddc8cb1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page