Skip to main content

AST-based code similarity detection tool

Project description

Mayat

Mayat is a code similarity detection tool developed by Tian(Maxwell) Yang. It works by comparing the Abstract Syntax Trees of students' code solutions and generate a similarity score for each pair of students' code.

Build & Install

  1. Clone the repo
git clone git@github.com:AnubisLMS/Mayat.git
  1. Install dependencies and install Mayat
cd Mayat
pip install -r requirements_dev.txt
python setup.py install
  1. Install tree-sitter parsers
python -m mayat.install_langs

Usage

Let's say we need to check all students' uniq.c for homework1. The path for each uniq.c has the format homework1/<unique-id>/user/uniq.c. All we need to do is run:

python -m mayat.frontends.TS_C homework1/*/user/uniq.c

If we only want to check the main function, we can do:

python -m mayat.frontends.TS_C homework1/*/user/uniq.c -f main

Additionally, we can pass more optional arguments for C.py:

  • --threshold: Specify the granularity for the matching algorithm. Default to 5. A smaller value will cause it to check trivial details, which increases the similarity score of two code even though they might not be similar. A larger value will cause it to overlook some common cheat tricks such as swapping two function definitions.

Supported Languages

  • C:
    • mayat.TS_C
    • mayat.C(Legacy)
  • Python:
    • mayat.TS_Python
    • mayat.Python(Legacy)
  • Java:
    • mayat.TS_Java

Implement a New PL's frontend

We implement a new programming language's frontend by using classes and functions defined in mayat. They are:

  • mayat.AST.AST: The base class for Abstract Syntax Tree. For a new PL you should inherit this and implement the AST.create(path) class method, which takes the path of a program as a parameter and returns the AST representation of that program. Currently it is preferred to use tree-sitter parsers to implement language frontends, whose corresponding file should be prefixed with TS_.
  • mayat.args.arg_parser: A argparse.ArgumentParser object. We need to use this object to retrieve command arguments. We can add new arguments if needed.
  • mayat.driver.driver: The driver function that takes the inherited AST class and the parsed arguments as parameters and run the plagiarism detection algorithm.

An example of this can be find in mayat/frontends/TS_C.py, which is a C frontend implemented using tree-sitter-c parser.

Testing

cd tests
python test.py -v

Limitations

This tool will never work for assembly code as the code has to be written in a high level programming language that can be converted into an AST. We can potentially figure out a way to automatically reverse engineer assembly code back to C and then convert it to AST. However, there's no guarantee that the reverse-engineered code can be a good representation for its assembly counterpart.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mayat-1.1.1.tar.gz (22.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mayat-1.1.1-py3-none-any.whl (16.6 kB view details)

Uploaded Python 3

File details

Details for the file mayat-1.1.1.tar.gz.

File metadata

  • Download URL: mayat-1.1.1.tar.gz
  • Upload date:
  • Size: 22.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for mayat-1.1.1.tar.gz
Algorithm Hash digest
SHA256 f94aca80ec8d80f36f41f483313bfd4c21af3148fe542a08912a30606b264665
MD5 c5a9bcdb1fdc24aa3c7bc3c6056cf149
BLAKE2b-256 b9865a911a7167765da3a3e11383e56c57df8a181ce8e7c5a90c0cad3486cc0c

See more details on using hashes here.

File details

Details for the file mayat-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: mayat-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 16.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for mayat-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5be4f3633054b711618e5dd1c483d279b013e8de93fc647e406795c3005e8307
MD5 27e5b91d4d909978f91448ef726d3cfb
BLAKE2b-256 f87ed309ef0d5486b2e12a375934cd60771105c79bd6db766e95b20af02434e9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page