Skip to main content

skeleton of a sytaxparser

Project description

Objects

Phrase

Used to define the structure of the syntax tree.

Token

Result value of the parsing process; container for text content and metadata.

NodeToken (Token)

Derivative of Token for the typing of NodeTokens. Always located at the first and last position of a Branch.

Branch (Token)

Result value of the parsing process; container for Tokens, NodeTokens and sub-Branches.

Root (Phrase)

Derivative of Phrase as root object. User-defined Phrases are added here and contains the parse method.

RootBranch (Branch)

Derivative of Branch as the main container. Is returned by Root.parse.

RootToken (Token)

Derivative of Token for the main container.

RootNodeToken (NodeToken)

Derivative of NodeToken. Represents the start resp. the end of the parsed data.

Configuration and syntax tree definition

The Phrase derivatives are used to define the syntax tree.

In theories of syntax, a Phrase is any group of words, or sometimes a single word, which plays a particular role within the syntactic structure of a sentence.

Sub-Phrases are added to these. The end of a Phrase is defined by a derivative of Branch.

The minimum configuration consists of overwriting the starts method in a derivative of Phrase, which in turn returns a derivative of Branch - with an override of the ends method corresponding to this derivative of Phrase.

Phrase.starts() → Branch.ends()

Parsing process

Input:

A list of rows (list[str])

Output:

RootBranch object

In the iteration, (the remaining part of) a row is first searched for start-NodeTokens of sub-Phrases of the current Phrase, then for the end-NodeToken of the current Branch. A start-NodeToken is defined by the return value of the starts method of a sub-Phrase. An end-NodeToken is defined by the return value of the ends method of the current Branch.

Depending on the type of search function definition in starts and the scope of a row, it can be efficient to search for further start points or end points after a start-NodeToken hit only in the previous content of this hit. The next_search_content method of Branch can be defined in more detail for this purpose. The hit instance is executed with the current content of the row in which the search is currently being performed as a parameter and should return the remaining content in which the search is to continue. By default, the content remains unchanged.

The return value of starts of a Phrase must be an instance of Branch for a hit. The start-NodeToken within the object is created based on its parameterization. To use a derivative of NodeToken/Token for this, the make_node method of a Branch derivative can be overwritten.

The return value of ends of a Branch must be an instance of NodeToken/Token for a hit.

The magic method “less than” (__lt__) of the NodeToken object is used to determine which NodeToken appears first in the row and is handled further. The relative start is used here by default.

After the NodeToken to be processed further has been determined, the previous content of the row is first added to the currently active Branch as a Token object. The call_branch_extend interface method of the Token instance is executed.

To use a Token derivative for this, the make_token method of a Branch derivative can be overwritten.

If it is an end-NodeToken, this is added to the active Branch and the previous Branch is then defined as the new active Branch. The call_branch_end interface method of the end-NodeToken instance is executed.

If it is a start-NodeToken, its Branch is defined as the new active Branch and the start-NodeToken is added to it. The interface method call_branch_start of the start-NodeToken instance is executed.

If no NodeToken is found, the current Branch is extended with a Token object as described above.

Finally, the remaining part of the row or the next row is passed to the next iteration.

Example

from re import compile

from syntax_parser_skeleton import *
from syntax_parser_skeleton.derivatives import simpleregex
from syntax_parser_skeleton import visualisation

root = RootPhrase("#root")

_bracket = simpleregex.SimpleRegexPhrase(compile('\\('), compile('\\)'), id="bracket").add_self()
_funcall = simpleregex.SimpleRegexPhrase(compile('\\w+\\s*\\('), compile('\\)'), id="function")
_consoleline = simpleregex.SimpleRegexPhrase(compile('>>>'), compile('$'), id="consoleline").add_phrases(_funcall)
_variable = simpleregex.SimpleRegexPhrase(compile('\\w+(?!\\s*\\()'), compile(''), id="variable")
_operation = simpleregex.SimpleRegexPhrase(compile('[-+*/]'), compile(''), id="operation")
_curly_brackets = simpleregex.SimpleRegexPhrase(compile("\\{"), compile("}"), id="curly brackets")
_string = simpleregex.SimpleRegexPhrase(compile("'"), compile("'"), id="string").add_phrases(_curly_brackets)
_bracket.add_phrases(_variable, _operation, _string, _funcall)
root.add_phrases(_bracket, _variable, _operation, _string, _consoleline)
_consoleline.add_phrases(root.sub_phrases)
_consoleline.sub_phrases.discard(_consoleline)
_funcall.add_phrases(root.sub_phrases)
result = root.parse([
    ">>> prettyprint('( (a * b / (c + a)) * (b / (c – a) * b) / c ) + a')\n",
    "(\n",
    "   (\n",
    "       a * b / (c + a)\n",
    "   ) * (\n",
    "       b / (c – a) * b\n",
    "   ) / c \n",
    ") + a",
])
visualisation.start_structure_graph_app(root)
https://raw.githubusercontent.com/srccircumflex/syntax-parser-skeleton/master/doc/graph.png
print(visualisation.pretty_xml_result(result))
<?xml version="1.0" ?>
<RB phrase="#root">
    <RN coord="0:0:0/0:0">''</RN>
    <B phrase="consoleline">
        <N coord="0:0:3/0:3">'&gt;&gt;&gt;'</N>
        <T coord="0:3:4/3:4">' '</T>
        <B phrase="function">
            <N coord="0:4:16/4:16">'prettyprint('</N>
            <B phrase="string">
                <N coord="0:16:17/16:17">&quot;'&quot;</N>
                <T coord="0:17:66/17:66">'( (a * b / (c + a)) * (b / (c  a) * b) / c ) + a'</T>
                <N coord="0:66:67/66:67">&quot;'&quot;</N>
            </B>
            <N coord="0:67:68/67:68">')'</N>
        </B>
        <N coord="0:68:68/68:68">''</N>
    </B>
    <RT coord="0:68:69/68:69">'\n'</RT>
    <B phrase="bracket">
        <N coord="1:0:1/69:70">'('</N>
        <T coord="1:1:2/70:71">'\n'</T>
        <T coord="2:0:3/71:74">'   '</T>
        <B phrase="bracket">
            <N coord="2:3:4/74:75">'('</N>
            <T coord="2:4:5/75:76">'\n'</T>
            <T coord="3:0:7/76:83">'       '</T>
            <B phrase="variable">
                <N coord="3:7:8/83:84">'a'</N>
                <N coord="3:8:8/84:84">''</N>
            </B>
            <T coord="3:8:9/84:85">' '</T>
            <B phrase="operation">
                <N coord="3:9:10/85:86">'*'</N>
                <N coord="3:10:10/86:86">''</N>
            </B>
            <T coord="3:10:11/86:87">' '</T>
            <B phrase="variable">
                <N coord="3:11:12/87:88">'b'</N>
                <N coord="3:12:12/88:88">''</N>
            </B>
            <T coord="3:12:13/88:89">' '</T>
            <B phrase="operation">
                <N coord="3:13:14/89:90">'/'</N>
                <N coord="3:14:14/90:90">''</N>
            </B>
            <T coord="3:14:15/90:91">' '</T>
            <B phrase="bracket">
                <N coord="3:15:16/91:92">'('</N>
                <B phrase="variable">
                    <N coord="3:16:17/92:93">'c'</N>
                    <N coord="3:17:17/93:93">''</N>
                </B>
                <T coord="3:17:18/93:94">' '</T>
                <B phrase="operation">
                    <N coord="3:18:19/94:95">'+'</N>
                    <N coord="3:19:19/95:95">''</N>
                </B>
                <T coord="3:19:20/95:96">' '</T>
                <B phrase="variable">
                    <N coord="3:20:21/96:97">'a'</N>
                    <N coord="3:21:21/97:97">''</N>
                </B>
                <N coord="3:21:22/97:98">')'</N>
            </B>
            <T coord="3:22:23/98:99">'\n'</T>
            <T coord="4:0:3/99:102">'   '</T>
            <N coord="4:3:4/102:103">')'</N>
        </B>
        <T coord="4:4:5/103:104">' '</T>
        <B phrase="operation">
            <N coord="4:5:6/104:105">'*'</N>
            <N coord="4:6:6/105:105">''</N>
        </B>
        <T coord="4:6:7/105:106">' '</T>
        <B phrase="bracket">
            <N coord="4:7:8/106:107">'('</N>
            <T coord="4:8:9/107:108">'\n'</T>
            <T coord="5:0:7/108:115">'       '</T>
            <B phrase="variable">
                <N coord="5:7:8/115:116">'b'</N>
                <N coord="5:8:8/116:116">''</N>
            </B>
            <T coord="5:8:9/116:117">' '</T>
            <B phrase="operation">
                <N coord="5:9:10/117:118">'/'</N>
                <N coord="5:10:10/118:118">''</N>
            </B>
            <T coord="5:10:11/118:119">' '</T>
            <B phrase="bracket">
                <N coord="5:11:12/119:120">'('</N>
                <B phrase="variable">
                    <N coord="5:12:13/120:121">'c'</N>
                    <N coord="5:13:13/121:121">''</N>
                </B>
                <T coord="5:13:16/121:124">'  '</T>
                <B phrase="variable">
                    <N coord="5:16:17/124:125">'a'</N>
                    <N coord="5:17:17/125:125">''</N>
                </B>
                <N coord="5:17:18/125:126">')'</N>
            </B>
            <T coord="5:18:19/126:127">' '</T>
            <B phrase="operation">
                <N coord="5:19:20/127:128">'*'</N>
                <N coord="5:20:20/128:128">''</N>
            </B>
            <T coord="5:20:21/128:129">' '</T>
            <B phrase="variable">
                <N coord="5:21:22/129:130">'b'</N>
                <N coord="5:22:22/130:130">''</N>
            </B>
            <T coord="5:22:23/130:131">'\n'</T>
            <T coord="6:0:3/131:134">'   '</T>
            <N coord="6:3:4/134:135">')'</N>
        </B>
        <T coord="6:4:5/135:136">' '</T>
        <B phrase="operation">
            <N coord="6:5:6/136:137">'/'</N>
            <N coord="6:6:6/137:137">''</N>
        </B>
        <T coord="6:6:7/137:138">' '</T>
        <B phrase="variable">
            <N coord="6:7:8/138:139">'c'</N>
            <N coord="6:8:8/139:139">''</N>
        </B>
        <T coord="6:8:10/139:141">' \n'</T>
        <N coord="7:0:1/141:142">')'</N>
    </B>
    <RT coord="7:1:2/142:143">' '</RT>
    <B phrase="operation">
        <N coord="7:2:3/143:144">'+'</N>
        <N coord="7:3:3/144:144">''</N>
    </B>
    <RT coord="7:3:4/144:145">' '</RT>
    <B phrase="variable">
        <N coord="7:4:5/145:146">'a'</N>
    </B>
    <RN coord="7:5:5/146:146">''</RN>
</RB>

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

syntax_parser_skeleton-1.2.0.tar.gz (90.3 kB view details)

Uploaded Source

Built Distribution

syntax_parser_skeleton-1.2.0-py3-none-any.whl (11.0 kB view details)

Uploaded Python 3

File details

Details for the file syntax_parser_skeleton-1.2.0.tar.gz.

File metadata

File hashes

Hashes for syntax_parser_skeleton-1.2.0.tar.gz
Algorithm Hash digest
SHA256 2161a603b7ca02ada3ed282c62a68358fb636a14762959c38191b3d6edb4629d
MD5 2ecc31b17ff4d75f7f34748e0ea9d163
BLAKE2b-256 ed2fbd5c9d25525f653b09964691e12a9ab06b9cbcceef6a838eb7453c411506

See more details on using hashes here.

File details

Details for the file syntax_parser_skeleton-1.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for syntax_parser_skeleton-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d88ca50e5a533767b9caa6cd9028491548683843a8208212a1e4ed8c65467afb
MD5 fd67a8e96ecee2c1d9a399f7ea013920
BLAKE2b-256 5fa831f419baa9b80c376b0dc46fb46c7c3f85a735e2b08b6e0c5c1f3fc629aa

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page