Skip to main content

Fast structural analysis of any programming language in Python

Project description

Code AST

Fast structural analysis of any programming language in Python

Programming Language Processing (PLP) brings the capabilities of modern NLP systems to the world of programming languages. To achieve high performance PLP systems, existing methods often take advantage of the fully defined nature of programming languages. Especially the syntactical structure can be exploited to gain knowledge about programs.

code.ast provides easy access to the syntactic structure of a program. By relying on tree-sitter as the back end, the parser supports fast parsing of variety of programming languages.

The goal of code.ast is to combine the efficiency and variety of languages supported by tree-sitter with the convenience of more native parsers (like libcst).

To achieve this, code.ast adds the features:

  1. Auto-loading: Compile of source code parsers for any language supported by tree-sitter with a single keyword,
  2. Visitors: Search the concrete syntax tree produced by tree-sitter quickly,
  3. Transformers: Transform source code easily by transforming the syntax structure

Note that tree-sitter produces a concrete syntax tree and we currently parse the CST as is. Future versions of code.ast might include options to simplify the CST to an AST.

Installation

The package is tested under Python 3. It can be installed via:

pip install code-ast

Note: You need to install the different tree-sitter languages as Python packages. For example, for Python, you need to install tree-sitter-python via PIP:

pip install tree-sitter-python

Note (since tree-sitter v0.22.0): Autoloading of languages or installation with tree_sitter_language to utilize pre-compiled languages is deprecated. If you fix the tree-sitter version to 0.21.3 than you can make use of pre-compiled languages via:

pip install tree_sitter_languages

Quick start

code.ast can parse nearly any program code in a few lines of code:

import code_ast

# Python
code_ast.ast(
    '''
        def my_func():
            print("Hello World")
    ''',
lang = "python")

# Output:
# PythonCodeAST [0, 0] - [4, 4]
#    module [1, 8] - [3, 4]
#        function_definition [1, 8] - [2, 32]
#            identifier [1, 12] - [1, 19]
#            parameters [1, 19] - [1, 21]
#            block [2, 12] - [2, 32]
#                expression_statement [2, 12] - [2, 32]
#                    call [2, 12] - [2, 32]
#                        identifier [2, 12] - [2, 17]
#                        argument_list [2, 17] - [2, 32]
#                            string [2, 18] - [2, 31]

# Java
code_ast.ast(
    '''
    public class HelloWorld {
        public static void main(String[] args){
          System.out.println("Hello World");
        }
    }
    ''',
lang = "java")

# Output: 
# JavaCodeAST [0, 0] - [7, 4]
#    program [1, 0] - [6, 4]
#        class_declaration [1, 0] - [5, 1]
#            modifiers [1, 0] - [1, 6]
#            identifier [1, 13] - [1, 23]
#            class_body [1, 24] - [5, 1]
#                method_declaration [2, 8] - [4, 9]
#                    ...

Visitors

code.ast implements the visitor pattern to quickly traverse the CST structure:

import code_ast
from code_ast import ASTVisitor

code = '''
    def f(x, y):
        return x + y
'''

# Count the number of identifiers
class IdentifierCounter(ASTVisitor):

    def __init__(self):
        self.count = 0
    
    def visit_identifier(self, node):
        self.count += 1

# Parse the AST and then visit it with our visitor
source_ast = code_ast.ast(code, lang = "python")

count_visitor = IdentifierCounter()
source_ast.visit(count_visitor)

count_visitor.count
# Output: 5

Transformers

Transformers provide an easy way to transform source code. For example, in the following, we want to mirror each binary addition:

import code_ast
from code_ast import ASTTransformer, FormattedUpdate, TreeUpdate

code = '''
    def f(x, y):
        return x + y + 0.5
'''

# Mirror binary operator on leave
class MirrorAddTransformer(ASTTransformer):
    def leave_binary_operator(self, node):
        if node.children[1].type == "+":
            return FormattedUpdate(
                " %s + %s",
                [
                    TreeUpdate(node.children[2]),
                    TreeUpdate(node.children[0])
                ]
            )

# Parse the AST and then visit it with our visitor
source_ast = code_ast.ast(code, lang = "python")

mirror_transformer = MirrorAddTransformer()

# Mirror transformer are initialized by running them as visitors
source_ast.visit(mirror_transformer)

# Transformer provide a minimal AST edit
mirror_transformer.edit()
# Output: 
# module [2, 0] - [5, 0]
#    function_definition [2, 0] - [3, 22]
#        block [3, 4] - [3, 22]
#            return_statement [3, 4] - [3, 22]
#                binary_operator -> FormattedUpdate [3, 11] - [3, 22]
#                    binary_operator -> FormattedUpdate [3, 11] - [3, 16]

# And it can be used to directly transform the code
mirror_transformer.code()
# Output:
# def f(x, y):
#    return 0.5 + y + x

Project Info

The goal of this project is to provide developer in the programming language processing community with easy access to syntax parsing. This is currently developed as a helper library for internal research projects. Therefore, it will only be updated as needed.

Feel free to open an issue if anything unexpected happens.

Distributed under the MIT license. See LICENSE for more information.

We thank the developer of tree-sitter library. Without tree-sitter this project would not be possible.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

code_ast-0.1.2.tar.gz (14.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

code_ast-0.1.2-py3-none-any.whl (12.7 kB view details)

Uploaded Python 3

File details

Details for the file code_ast-0.1.2.tar.gz.

File metadata

  • Download URL: code_ast-0.1.2.tar.gz
  • Upload date:
  • Size: 14.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for code_ast-0.1.2.tar.gz
Algorithm Hash digest
SHA256 ed5c12401724d196ef008bce044d7947df35d65480a6bbdf43b2e12fe8793fb9
MD5 ce3a09e72a6132df1c058c990823834b
BLAKE2b-256 75ff70b2aa0a256ae8ae7deae66b64ff6ab839f37b584c608ea8d29b9eaa41ad

See more details on using hashes here.

File details

Details for the file code_ast-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: code_ast-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 12.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for code_ast-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 31bde808bff87917a6bfe14c21ddbb1d59e51a09e9829cac84d7afa5848e4bb5
MD5 8b2afe9f11d6e49a798c56fba90b0ce2
BLAKE2b-256 df6d7417a6088adafe5ce11bb8434d9c8916f59b157a685eddf28603b2e55e5c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page