Skip to main content

skeleton of a sytaxparser

Project description

Objects

Phrase

Used to define the structure of the syntax tree.

Token

Result value of the parsing process; container for text content and metadata.

NodeToken (Token)

Derivative of Token for the typing of NodeTokens. Always located at the first and last position of a Branch.

Branch (Token)

Result value of the parsing process; container for Tokens, NodeTokens and sub-Branches.

Root (Phrase)

Derivative of Phrase as root object. User-defined Phrases are added here and contains the parse method.

RootBranch (Branch)

Derivative of Branch as the main container. Is returned by Root.parse.

RootToken (Token)

Derivative of Token for the main container.

RootNodeToken (NodeToken)

Derivative of NodeToken. Represents the start resp. the end of the parsed data.

Configuration and syntax tree definition

The Phrase derivatives are used to define the syntax tree.

In theories of syntax, a Phrase is any group of words, or sometimes a single word, which plays a particular role within the syntactic structure of a sentence.

Sub-Phrases are added to these. The end of a Phrase is defined by a derivative of Branch.

The minimum configuration consists of overwriting the starts method in a derivative of Phrase, which in turn returns a derivative of Branch - with an override of the ends method corresponding to this derivative of Phrase.

Phrase.starts() → Branch.ends()

Parsing process

Input:

A list of rows (list[str])

Output:

RootBranch object

In the iteration, (the remaining part of) a row is first searched for start-NodeTokens of sub-Phrases of the current Phrase, then for the end-NodeToken of the current Branch. A start-NodeToken is defined by the return value of the starts method of a sub-Phrase. An end-NodeToken is defined by the return value of the ends method of the current Branch.

Depending on the type of search function definition in starts and the scope of a row, it can be efficient to search for further start points or end points after a start-NodeToken hit only in the previous content of this hit. The next_search_content method of Branch can be defined in more detail for this purpose. The hit instance is executed with the current content of the row in which the search is currently being performed as a parameter and should return the remaining content in which the search is to continue. By default, the content remains unchanged.

The return value of starts of a Phrase must be an instance of Branch for a hit. The start-NodeToken within the object is created based on its parameterization. To use a derivative of NodeToken/Token for this, the make_node method of a Branch derivative can be overwritten.

The return value of ends of a Branch must be an instance of NodeToken/Token for a hit.

The magic method “less than” (__lt__) of the NodeToken object is used to determine which NodeToken appears first in the row and is handled further. The relative start is used here by default.

After the NodeToken to be processed further has been determined, the previous content of the row is first added to the currently active Branch as a Token object. The call_branch_extend interface method of the Token instance is executed.

To use a Token derivative for this, the make_token method of a Branch derivative can be overwritten.

If it is an end-NodeToken, this is added to the active Branch and the previous Branch is then defined as the new active Branch. The call_branch_end interface method of the end-NodeToken instance is executed.

If it is a start-NodeToken, its Branch is defined as the new active Branch and the start-NodeToken is added to it. The interface method call_branch_start of the start-NodeToken instance is executed.

If no NodeToken is found, the current Branch is extended with a Token object as described above.

Finally, the remaining part of the row or the next row is passed to the next iteration.

Example

from re import compile

from syntax_parser_skeleton import *
from syntax_parser_skeleton.derivatives import simpleregex
from syntax_parser_skeleton import visualisation

root = RootPhrase("#root")

_bracket = simpleregex.SimpleRegexPhrase(compile('\\('), compile('\\)'), id="bracket").add_self()
_funcall = simpleregex.SimpleRegexPhrase(compile('\\w+\\s*\\('), compile('\\)'), id="function")
_consoleline = simpleregex.SimpleRegexPhrase(compile('>>>'), compile('$'), id="consoleline").add_phrases(_funcall)
_variable = simpleregex.SimpleRegexPhrase(compile('\\w+(?!\\s*\\()'), compile(''), id="variable")
_operation = simpleregex.SimpleRegexPhrase(compile('[-+*/]'), compile(''), id="operation")
_curly_brackets = simpleregex.SimpleRegexPhrase(compile("\\{"), compile("}"), id="curly brackets")
_string = simpleregex.SimpleRegexPhrase(compile("'"), compile("'"), id="string").add_phrases(_curly_brackets)
_bracket.add_phrases(_variable, _operation, _string, _funcall)
root.add_phrases(_bracket, _variable, _operation, _string, _consoleline)
_consoleline.add_phrases(root.sub_phrases)
_consoleline.sub_phrases.discard(_consoleline)
_funcall.add_phrases(root.sub_phrases)
result = root.parse([
    ">>> prettyprint('( (a * b / (c + a)) * (b / (c – a) * b) / c ) + a')\n",
    "(\n",
    "   (\n",
    "       a * b / (c + a)\n",
    "   ) * (\n",
    "       b / (c – a) * b\n",
    "   ) / c \n",
    ") + a",
])
visualisation.start_structure_graph_app(root)
https://raw.githubusercontent.com/srccircumflex/syntax-parser-skeleton/master/doc/graph.png
print(visualisation.pretty_xml_result(result))
<?xml version="1.0" ?>
<RB phrase="#root">
    <RN coord="0:0:0/0:0">''</RN>
    <B phrase="consoleline">
        <N coord="0:0:3/0:3">'&gt;&gt;&gt;'</N>
        <T coord="0:3:4/3:4">' '</T>
        <B phrase="function">
            <N coord="0:4:16/4:16">'prettyprint('</N>
            <B phrase="string">
                <N coord="0:16:17/16:17">&quot;'&quot;</N>
                <T coord="0:17:66/17:66">'( (a * b / (c + a)) * (b / (c  a) * b) / c ) + a'</T>
                <N coord="0:66:67/66:67">&quot;'&quot;</N>
            </B>
            <N coord="0:67:68/67:68">')'</N>
        </B>
        <N coord="0:68:68/68:68">''</N>
    </B>
    <RT coord="0:68:69/68:69">'\n'</RT>
    <B phrase="bracket">
        <N coord="1:0:1/69:70">'('</N>
        <T coord="1:1:2/70:71">'\n'</T>
        <T coord="2:0:3/71:74">'   '</T>
        <B phrase="bracket">
            <N coord="2:3:4/74:75">'('</N>
            <T coord="2:4:5/75:76">'\n'</T>
            <T coord="3:0:7/76:83">'       '</T>
            <B phrase="variable">
                <N coord="3:7:8/83:84">'a'</N>
                <N coord="3:8:8/84:84">''</N>
            </B>
            <T coord="3:8:9/84:85">' '</T>
            <B phrase="operation">
                <N coord="3:9:10/85:86">'*'</N>
                <N coord="3:10:10/86:86">''</N>
            </B>
            <T coord="3:10:11/86:87">' '</T>
            <B phrase="variable">
                <N coord="3:11:12/87:88">'b'</N>
                <N coord="3:12:12/88:88">''</N>
            </B>
            <T coord="3:12:13/88:89">' '</T>
            <B phrase="operation">
                <N coord="3:13:14/89:90">'/'</N>
                <N coord="3:14:14/90:90">''</N>
            </B>
            <T coord="3:14:15/90:91">' '</T>
            <B phrase="bracket">
                <N coord="3:15:16/91:92">'('</N>
                <B phrase="variable">
                    <N coord="3:16:17/92:93">'c'</N>
                    <N coord="3:17:17/93:93">''</N>
                </B>
                <T coord="3:17:18/93:94">' '</T>
                <B phrase="operation">
                    <N coord="3:18:19/94:95">'+'</N>
                    <N coord="3:19:19/95:95">''</N>
                </B>
                <T coord="3:19:20/95:96">' '</T>
                <B phrase="variable">
                    <N coord="3:20:21/96:97">'a'</N>
                    <N coord="3:21:21/97:97">''</N>
                </B>
                <N coord="3:21:22/97:98">')'</N>
            </B>
            <T coord="3:22:23/98:99">'\n'</T>
            <T coord="4:0:3/99:102">'   '</T>
            <N coord="4:3:4/102:103">')'</N>
        </B>
        <T coord="4:4:5/103:104">' '</T>
        <B phrase="operation">
            <N coord="4:5:6/104:105">'*'</N>
            <N coord="4:6:6/105:105">''</N>
        </B>
        <T coord="4:6:7/105:106">' '</T>
        <B phrase="bracket">
            <N coord="4:7:8/106:107">'('</N>
            <T coord="4:8:9/107:108">'\n'</T>
            <T coord="5:0:7/108:115">'       '</T>
            <B phrase="variable">
                <N coord="5:7:8/115:116">'b'</N>
                <N coord="5:8:8/116:116">''</N>
            </B>
            <T coord="5:8:9/116:117">' '</T>
            <B phrase="operation">
                <N coord="5:9:10/117:118">'/'</N>
                <N coord="5:10:10/118:118">''</N>
            </B>
            <T coord="5:10:11/118:119">' '</T>
            <B phrase="bracket">
                <N coord="5:11:12/119:120">'('</N>
                <B phrase="variable">
                    <N coord="5:12:13/120:121">'c'</N>
                    <N coord="5:13:13/121:121">''</N>
                </B>
                <T coord="5:13:16/121:124">'  '</T>
                <B phrase="variable">
                    <N coord="5:16:17/124:125">'a'</N>
                    <N coord="5:17:17/125:125">''</N>
                </B>
                <N coord="5:17:18/125:126">')'</N>
            </B>
            <T coord="5:18:19/126:127">' '</T>
            <B phrase="operation">
                <N coord="5:19:20/127:128">'*'</N>
                <N coord="5:20:20/128:128">''</N>
            </B>
            <T coord="5:20:21/128:129">' '</T>
            <B phrase="variable">
                <N coord="5:21:22/129:130">'b'</N>
                <N coord="5:22:22/130:130">''</N>
            </B>
            <T coord="5:22:23/130:131">'\n'</T>
            <T coord="6:0:3/131:134">'   '</T>
            <N coord="6:3:4/134:135">')'</N>
        </B>
        <T coord="6:4:5/135:136">' '</T>
        <B phrase="operation">
            <N coord="6:5:6/136:137">'/'</N>
            <N coord="6:6:6/137:137">''</N>
        </B>
        <T coord="6:6:7/137:138">' '</T>
        <B phrase="variable">
            <N coord="6:7:8/138:139">'c'</N>
            <N coord="6:8:8/139:139">''</N>
        </B>
        <T coord="6:8:10/139:141">' \n'</T>
        <N coord="7:0:1/141:142">')'</N>
    </B>
    <RT coord="7:1:2/142:143">' '</RT>
    <B phrase="operation">
        <N coord="7:2:3/143:144">'+'</N>
        <N coord="7:3:3/144:144">''</N>
    </B>
    <RT coord="7:3:4/144:145">' '</RT>
    <B phrase="variable">
        <N coord="7:4:5/145:146">'a'</N>
    </B>
    <RN coord="7:5:5/146:146">''</RN>
</RB>

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

syntax_parser_skeleton-1.1.0.tar.gz (90.3 kB view details)

Uploaded Source

Built Distribution

syntax_parser_skeleton-1.1.0-py3-none-any.whl (11.0 kB view details)

Uploaded Python 3

File details

Details for the file syntax_parser_skeleton-1.1.0.tar.gz.

File metadata

File hashes

Hashes for syntax_parser_skeleton-1.1.0.tar.gz
Algorithm Hash digest
SHA256 e9073a995f8528787d265c37cbb46e2c34ec924954886e1f6068ef675ae2beb0
MD5 0abfbcb182434750d481af20b0dd5434
BLAKE2b-256 646fc07411eb43b026bc6492b1c84739212f9030ec6457d57d2b066ec1ebf55f

See more details on using hashes here.

File details

Details for the file syntax_parser_skeleton-1.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for syntax_parser_skeleton-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ad08bcad627815fb3d5a895b93b8f41fb9011fc151999416aaf3a0c7cc8f87c2
MD5 28582f1e51ab9661d2f22950fe23b777
BLAKE2b-256 263865b4323108872af8e5868b6d6ce99d7e24d5b5fa95c77ac34f3fb3825573

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page