AST-based code similarity detection tool
Project description
Mayat
Mayat is a code similarity detection tool developed by Tian(Maxwell) Yang. It works by comparing the Abstract Syntax Trees of students' code solutions and generate a similarity score for each pair of students' code.
Build & Install
- Clone the repo
git clone git@github.com:AnubisLMS/Mayat.git
- Install dependencies and install Mayat
cd Mayat
pip install -r requirements_dev.txt
python setup.py install
- Install
tree-sitterparsers
python -m mayat.install_langs
Usage
Let's say we need to check all students' uniq.c for homework1. The path for each uniq.c has the format homework1/<unique-id>/user/uniq.c. All we need to do is run:
python -m mayat.frontends.TS_C homework1/*/user/uniq.c
If we only want to check the main function, we can do:
python -m mayat.frontends.TS_C homework1/*/user/uniq.c -f main
Additionally, we can pass more optional arguments for C.py:
--threshold: Specify the granularity for the matching algorithm. Default to5. A smaller value will cause it to check trivial details, which increases the similarity score of two code even though they might not be similar. A larger value will cause it to overlook some common cheat tricks such as swapping two function definitions.
Supported Languages
- C:
mayat.TS_Cmayat.C(Legacy)
- Python:
mayat.TS_Pythonmayat.Python(Legacy)
- Java:
mayat.TS_Java
Implement a New PL's frontend
We implement a new programming language's frontend by using classes and functions defined in mayat. They are:
mayat.AST.AST: The base class for Abstract Syntax Tree. For a new PL you should inherit this and implement theAST.create(path)class method, which takes the path of a program as a parameter and returns the AST representation of that program. Currently it is preferred to usetree-sitterparsers to implement language frontends, whose corresponding file should be prefixed withTS_.mayat.args.arg_parser: Aargparse.ArgumentParserobject. We need to use this object to retrieve command arguments. We can add new arguments if needed.mayat.driver.driver: The driver function that takes the inherited AST class and the parsed arguments as parameters and run the plagiarism detection algorithm.
An example of this can be find in mayat/frontends/TS_C.py, which is a C frontend implemented using tree-sitter-c parser.
Testing
cd tests
python test.py -v
Limitations
This tool will never work for assembly code as the code has to be written in a high level programming language that can be converted into an AST. We can potentially figure out a way to automatically reverse engineer assembly code back to C and then convert it to AST. However, there's no guarantee that the reverse-engineered code can be a good representation for its assembly counterpart.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mayat-1.1.1.tar.gz.
File metadata
- Download URL: mayat-1.1.1.tar.gz
- Upload date:
- Size: 22.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f94aca80ec8d80f36f41f483313bfd4c21af3148fe542a08912a30606b264665
|
|
| MD5 |
c5a9bcdb1fdc24aa3c7bc3c6056cf149
|
|
| BLAKE2b-256 |
b9865a911a7167765da3a3e11383e56c57df8a181ce8e7c5a90c0cad3486cc0c
|
File details
Details for the file mayat-1.1.1-py3-none-any.whl.
File metadata
- Download URL: mayat-1.1.1-py3-none-any.whl
- Upload date:
- Size: 16.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5be4f3633054b711618e5dd1c483d279b013e8de93fc647e406795c3005e8307
|
|
| MD5 |
27e5b91d4d909978f91448ef726d3cfb
|
|
| BLAKE2b-256 |
f87ed309ef0d5486b2e12a375934cd60771105c79bd6db766e95b20af02434e9
|