Skip to main content

Fast AST based code differencing in Python

Project description

Code Diff


Fast AST based code differencing in Python

Software projects are constantly evolving to integrate new features or improve existing implementations. To keep track of this progress, it becomes important to track individual code changes. Code differencing provides a way to identify the smallest code change between two implementations.

code.diff provides a fast alternative to standard code differencing techniques with a focus on AST based code differencing. As part of this library, we include a fast reimplementation of the GumTree algorithm. However, by relying on a best-effort AST parser, we are able to generate AST code changes for individual code snippets. Many programming languages including Python, Java and JavaScript are supported!

Installation

The package is tested under Python 3. It can be installed via:

pip install code-diff

Usage

code.diff can compute a code difference for nearly any program code in a few lines of code:

import code_diff as cd

# Python
output = cd.difference(
    '''
        def my_func():
            print("Hello World")
    ''',
    '''
        def say_helloworld():
            print("Hello World")
    ''',
lang = "python")

# Output: my_func -> say_helloworld

output.edit_script()

# Output: 
# [
#  Update((identifier:my_func, line 1:12 - 1:19), say_helloworld)
#]


# Java
output = cd.difference(
    '''
        int x = x + 1;
    ''',
    '''
        int x = x / 2;
    ''',
lang = "java")

# Output: x + 1 -> x / 2

output.edit_script()

# Output: [
#  Insert(/:/, (binary_operator, line 0:4 - 0:9), 1),
#  Update((integer:1, line 0:8 - 0:9), 2),
#  Delete((+:+, line 0:6 - 0:7))
#]

Language support

code.diff supports most programming languages where an AST can be computed. To parse an AST, the underlying parser employs

  • code.tokenize: A frontend for tree-sitter to effectively parse and tokenize program code in Python.

  • tree-sitter: A best-effort AST parser supporting many programming languages including Python, Java and JavaScript.

To decide whether your code can be handled by code.diff please review the libraries above.

GumTree: To compute an edit script between a source and target AST, we employ a Python reimplementation of the GumTree algorithm. Note however that the computed script are heavily dependent on the AST representation of the given code. Therefore, AST edit script computed with code.diff might significantly differ to the one computed by GumTree.

Release history

  • 0.1.0
    • Initial functionality
    • Documentation
    • SStuB Testing

Project Info

The goal of this project is to provide developer with easy access to AST-based code differencing. This is currently developed as a helper library for internal research projects. Therefore, it will only be updated as needed.

Feel free to open an issue if anything unexpected happens.

Cedric Richter - @cedricrupb - cedric.richter@uni-oldenburg.de

Distributed under the MIT license. See LICENSE for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

code_diff-0.1.0.tar.gz (14.0 kB view details)

Uploaded Source

Built Distribution

code_diff-0.1.0-py3-none-any.whl (13.6 kB view details)

Uploaded Python 3

File details

Details for the file code_diff-0.1.0.tar.gz.

File metadata

  • Download URL: code_diff-0.1.0.tar.gz
  • Upload date:
  • Size: 14.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.8.8

File hashes

Hashes for code_diff-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0bfc1ddc35656c9636416d85a47aec771f174128329c6a3f1e8be257bc6795c6
MD5 28ca112b490a6fe1a689e4a77e811892
BLAKE2b-256 2e2e29082043d8d6c6945e6750f5c745f3fb2f5931df2b461d7d6b0894263f11

See more details on using hashes here.

File details

Details for the file code_diff-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: code_diff-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 13.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.8.8

File hashes

Hashes for code_diff-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f76c10470c767fdc1796835b1dd3f53556720f88727c2eca197bcd8cea191909
MD5 866a1bee1754ea84a61b7f6d132cd4b0
BLAKE2b-256 c0bdad0fd9a795b34fbcb8a3c210b542fe7df5455c527ef74080c6b0a70feb93

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page