Skip to main content

Tools for parsing two-dimensional programming languages

Project description

parse_2d

Tools for parsing two-dimensional programming languages.

Example

Suppose we want to parse a diagram representing a path, with >, v, <, and ^ each being single steps.

>v  >>
 v  ^
 >>>^

One way of tokenizing this is to interpret each of these steps as a token, with a value representing its direction.

from parse_2d import Diagram, TinyTokenizer, tokenize

diagram = Diagram.from_string(">v  >>\n v  ^\n >>>^")

tokenizers = [
    TinyTokenizer(">", 0),
    TinyTokenizer("v", 1),
    TinyTokenizer("<", 2),
    TinyTokenizer("^", 3),
]

for token in tokenize(diagram, tokenizers):
    print(token)

Each Token has a region and a value. The region is what area it covers in the original diagram, while the value can be any Python object representing what you've tokenized.

Alternatively, you can extract the path as a single token, using the WireTokenizer, or as a directed path, by subclassing WireTokenizer.

A more complete sample is also provided, to demonstrate the use of these tools, by parsing the Circuit Diagram language.

Reference

Diagram

A Diagram is an infinite two-dimensional grid of "symbols", with a distinguished "whitespace" symbol. Diagrams may be instantiated with a list of lists and the whitespace symbol, or by the from_string method.

Manual instantiation

>>> diagram = Diagram([[1, 2], [3]], 0)
>>> diagram[(0, 1)]
3
>>> diagram[(1, 1)]
0
>>> diagram[(-30, 17)]
0

from_string

>>> diagram = Diagram.from_string("ab\nc")
>>> diagram[(0, 1)]
'c'
>>> diagram[(1, 1)]
' '

Region

A Region is an area on a diagram. Custom Regions may be made by inheriting from Region. The following Regions are provided by default:

TinyRegion(location)

A Region consisting of a single point. Has the location property to provide that point.

RectRegion(top_left, bottom_right)

A rectangular Region, aligned with the axes, consisting of the points bounded by top_left and bottom_right, including the top and left edges, and excluding the bottom and right edges (analogously to range).

SparseRegion(contents)

A Region consisting of a collection of disparate points. Has the contents property to provide that frozenset of points.

Token

A Token consists of a region covered, and a value that the token represents.

Tokenizer

A Tokenizer is an object for extracting tokens from diagrams. Custom Tokenizer classes may be made by inheriting from Tokenizer, and overriding the starts_on and extract_token methods. See the Tokenizer docstring for more details.

TinyTokenizer(symbol, value)

Tokenizer for tokens represented by a single symbol.

Extracts a token of value token_value for every symbol in the diagram.

TemplateTokenizer(template, token_value)

Tokenizer for tokens represented by a fixed template of symbols.

The template is either a mapping of relative locations to symbols, or a Diagram.

Extracts a token of value token_value for every non-overlapping translation of the template found in the parent Diagram.

WireTokenizer(segment_connections)

Tokenizer for wire tokens, represented by a path through a diagram.

A wire consists of multiple symbol "segments", each of which has a fixed collection of directions it can connect to.

The segment_connections is a mapping from segment symbols to a collection of that segment's available connections.

Extracts a wire token representing the available connections to that wire.

This class assumes that segments connect all possible incoming directions to all possible outgoing directions. Child classes may override this behavior by overriding the connections method. See the WireTokenizer docstring for more details.

BoxTokenizer(edge_symbols, contents_tokenizer)

Tokenizer for tokens represented by a box of edge symbols.

edge_tokens is a mapping from a side of the box, to the collection of symbols that may be used for that edge.

contents_tokenizer is a function to determine the value of the extracted token, and is passed the entire box (including the edge) as its only parameter.

tokenize(diagram, tokenizers)

Yields the non-overlapping tokens found in the diagram by the list of tokenizers.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parse_2d-1.0.0.tar.gz (8.8 kB view details)

Uploaded Source

Built Distribution

parse_2d-1.0.0-py3-none-any.whl (11.5 kB view details)

Uploaded Python 3

File details

Details for the file parse_2d-1.0.0.tar.gz.

File metadata

  • Download URL: parse_2d-1.0.0.tar.gz
  • Upload date:
  • Size: 8.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.2

File hashes

Hashes for parse_2d-1.0.0.tar.gz
Algorithm Hash digest
SHA256 cf6cfe1238e40be56ab689fb0c8c25c53d4043b681d2d370da33645dd2592803
MD5 ed2735aabe528ddce6b32014c0c15281
BLAKE2b-256 cd90b8cfeacc34404d309e40f3ced233778e5bf2a39dd0753a49bdbee4862722

See more details on using hashes here.

File details

Details for the file parse_2d-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: parse_2d-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 11.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.2

File hashes

Hashes for parse_2d-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 af49420675134582cb2a1885145080dbb83b1254a977026e8ab0e3911c93c9b1
MD5 27de187ca125858121a0fbc351c66209
BLAKE2b-256 50ed70132580627b7c8549da6e0c9547f54baae0de9faa3946dcb30eaac054cf

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page