Skip to main content

AST-based code similarity detection tool

Project description

Mayat

Mayat is a code similarity detection tool developed by Tian(Maxwell) Yang. It works by comparing the Abstract Syntax Trees of students' code solutions and generate a similarity score for each pair of students' code.

Build & Install

  1. Clone the repo
git clone git@github.com:AnubisLMS/Mayat.git
  1. Install dependencies and install Mayat
cd Mayat
pip install -r requirements_dev.txt
python setup.py install
  1. Install tree-sitter parsers
python -m mayat.install_langs

Usage

Let's say we need to check all students' uniq.c for homework1. The path for each uniq.c has the format homework1/<unique-id>/user/uniq.c. All we need to do is run:

python -m mayat.frontends.TS_C homework1/*/user/uniq.c

If we only want to check the main function, we can do:

python -m mayat.frontends.TS_C homework1/*/user/uniq.c -f main

Additionally, we can pass more optional arguments for C.py:

  • --threshold: Specify the granularity for the matching algorithm. Default to 5. A smaller value will cause it to check trivial details, which increases the similarity score of two code even though they might not be similar. A larger value will cause it to overlook some common cheat tricks such as swapping two function definitions.

Supported Languages

  • C:
    • mayat.TS_C
    • mayat.C(Legacy)
  • Python:
    • mayat.TS_Python
    • mayat.Python(Legacy)
  • Java:
    • mayat.TS_Java

Implement a New PL's frontend

We implement a new programming language's frontend by using classes and functions defined in mayat. They are:

  • mayat.AST.AST: The base class for Abstract Syntax Tree. For a new PL you should inherit this and implement the AST.create(path) class method, which takes the path of a program as a parameter and returns the AST representation of that program. Currently it is preferred to use tree-sitter parsers to implement language frontends, whose corresponding file should be prefixed with TS_.
  • mayat.args.arg_parser: A argparse.ArgumentParser object. We need to use this object to retrieve command arguments. We can add new arguments if needed.
  • mayat.driver.driver: The driver function that takes the inherited AST class and the parsed arguments as parameters and run the plagiarism detection algorithm.

An example of this can be find in mayat/frontends/TS_C.py, which is a C frontend implemented using tree-sitter-c parser.

Testing

cd tests
python test.py -v

Limitations

This tool will never work for assembly code as the code has to be written in a high level programming language that can be converted into an AST. We can potentially figure out a way to automatically reverse engineer assembly code back to C and then convert it to AST. However, there's no guarantee that the reverse-engineered code can be a good representation for its assembly counterpart.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mayat-1.1.1.tar.gz (22.4 kB view hashes)

Uploaded Source

Built Distribution

mayat-1.1.1-py3-none-any.whl (16.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page