Plagiarism detection tool for programming assignments
Project description
CheatHit: plagiarism detection tool for programming assignments
Installation
Install Python 3.9 or higher, then run:
pip install cheathit
Usage
Running
Run CheatHit as follows:
cheathit /submission/directory
Or with parameters:
cheathit /submission/directory --path=group/student/problem/attempt --min-ngram=3 --max-ngram=10 --min-ratio=0.5 --max-clique=4
Or, if you want to save results to a file:
cheathit /submission/directory > /path/to/file
Parameters
--path
Specify the structure of the submission directory with this parameter. Use the student
, group
, problem
, and attempt
sections separated with slashes, e.g., group/group/student/problem/problem/attempt
. Each subsequent section gets CheatHit one level down the directory tree; the last level must be a file containing the submission.
student
corresponds to the set of programs submitted by an individual student;group
corresponds to a group of students such that cheating is likely to take place within such a group (e.g., a school class);problem
corresponds to a separate task shared by the students;attempt
corresponds to a separate submission of a student.
The student
section is required (i.e., there should be at least one of these in --path
); the other three sections are optional. If the same section appears in --path
multiple times, CheatHit will simply concatenate its values to obtain the “true” representation of the section.
The default value of --path
is student
, which is suitable for cases when there is a single directory with many files, one file per student.
--min-ngram
Minimum ngram size (number of consecutive tokens) to analyze across the submissions.
The default value of --min-ngram
is 1
.
--max-ngram
Maximum ngram size (number of consecutive tokens) to analyze across the submissions.
The default value of --max-ngram
is 20
.
--min-ratio
Minimum ratio of the number of tokens shared by two submissions to the number of tokens in the longer of the submissions required so that the pair is included in the report.
The default value of --min-ratio
is 0.2
.
--max-clique
If an ngram (a sequence of tokens) occurs in submissions of more than --max-clique
students (or in submissions of students from more than --max-clique
groups), it is not considered distinctive.
The default value of --max-clique
is:
2
if students are assigned groups (--path
includesgroup
),5
otherwise.
Tokenization
CheatHit tokenizes source code into alphanumeric words (which can also contain underscores) and non-alphanumeric characters. Whitespace, semicolons, and commas are ignored. Two special markers, <START>
and <END>
, are added to the beginning and end of a token sequence. Hence,
CheatHit supports C++, Python v2 & v3, and _even_ VB.NET; awesome!
would be tokenized as
['<START>', 'CheatHit', 'supports', 'C', '+', '+', 'Python', 'v2', '&', 'v3', 'and', '_even_', 'VB', '.', 'NET', 'awesome', '!', '<END>']
Results
For each pair of students CheatHit will report how much code is shared between the students while adjusting for how distinctive the shared code is. See the Parameters and Tokenization sections above for an insight into what CheatHit considers distinctive.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file cheathit-1.0.1.tar.gz
.
File metadata
- Download URL: cheathit-1.0.1.tar.gz
- Upload date:
- Size: 10.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 960483aba5ec6aae5ed7c1d23cff084931fb09db75e58cac47483506073fce78 |
|
MD5 | 8bc5c1698055566c5ab5993689fccfcc |
|
BLAKE2b-256 | 5b886c6707b89d5b3350e419713d8c2e057308c5537198f1218d5282da49dcf5 |
File details
Details for the file cheathit-1.0.1-py3-none-any.whl
.
File metadata
- Download URL: cheathit-1.0.1-py3-none-any.whl
- Upload date:
- Size: 10.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 223fdbbdaf551ffde864d01abcaa47beb66048c8422b391a5153736651928a20 |
|
MD5 | aa07a6df0eef24c2aacf7827fea93452 |
|
BLAKE2b-256 | 32b96a69a983c1c343437f9a71531147cd168f1f42fb0b542cf06a9bdd84e721 |