Skip to main content

AST edit distance

Project description

CoDist

CoDist (Code Distance) is a library that provides functions to calculate the edit distance of abstract syntax trees.

While this library is primarily concerned with AST edit distances, it can handle any generic tree of the form: Tree[T] = tuple[T, tuple[Tree[T], ...]] or forest of the form: Forest[T] = tuple[Tree[T], ...].

To compare the distances of trees use codist.tree_dist and for forests, use codist.forest_dist.

Install

pip install codist

Usage

Currently, only AST node type information is compared. A silhouette of an AST (an AST containing only type information) is constructed with the parse_ast_silhouette function. The distance between two ASTs can be calculated with the tree_dist function.

from pprint import pprint

from codist import tree_dist
from codist.ast import parse_ast_silhouette

code1 = """
def process(data):
    result = []
    for x in data:
        if x > 5:
            result.append(x)
    return result
"""

code2 = """
def process(data):
    result = []
    for x in data:
        if x >= 6:
            result += [x]
    return result
"""

ast1 = parse_ast_silhouette(code1)
ast2 = parse_ast_silhouette(code2)

dist = tree_dist(ast1, ast2)

pprint(ast1)
pprint(ast2)
print("The above trees have a distance of:", dist)

Would print:

('Module',
 (('FunctionDef',
   (('arguments', (('arg', ()),)),
    ('Assign', (('Name', (('Store', ()),)), ('List', (('Load', ()),)))),
    ('For',
     (('Name', (('Store', ()),)),
      ('Name', (('Load', ()),)),
      ('If',
       (('Compare', (('Name', (('Load', ()),)), ('Gt', ()), ('Constant', ()))),
        ('Expr',
         (('Call',
           (('Attribute', (('Name', (('Load', ()),)), ('Load', ()))),
            ('Name', (('Load', ()),)))),)))))),
    ('Return', (('Name', (('Load', ()),)),)))),))
('Module',
 (('FunctionDef',
   (('arguments', (('arg', ()),)),
    ('Assign', (('Name', (('Store', ()),)), ('List', (('Load', ()),)))),
    ('For',
     (('Name', (('Store', ()),)),
      ('Name', (('Load', ()),)),
      ('If',
       (('Compare', (('Name', (('Load', ()),)), ('GtE', ()), ('Constant', ()))),
        ('AugAssign',
         (('Name', (('Store', ()),)),
          ('Add', ()),
          ('List', (('Name', (('Load', ()),)), ('Load', ()))))))))),
    ('Return', (('Name', (('Load', ()),)),)))),))
The above trees have a distance of: 8

A custom set of Cost functions can be provided to change the weights of insertions, deletions, and relabelings. By default, all change operations are 1 except for the case of γ(a -> a) which is 0. To change the cost, construct a Cost object:

from codist import Cost

cost = Cost(
    delete=(lambda n: 3),
    insert=(lambda n: 3),
    relabel=(lambda n1, n2: 0 if n1 == n2 else 2),
)

dist = tree_dist(ast1, ast2, cost=cost)

print("The above trees have a distance of:", dist)

Which prints:

The above trees have a distance of: 20

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codist-0.0.1.tar.gz (4.2 kB view hashes)

Uploaded Source

Built Distribution

codist-0.0.1-py3-none-any.whl (3.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page