Skip to main content

Plagiarism detection tool for programming assignments

Project description

CheatHit: plagiarism detection tool for programming assignments

Installation

Install Python 3.9 or higher, then run:

pip install cheathit

Usage

Running

Run CheatHit as follows:

cheathit /submission/directory

Or with parameters:

cheathit /submission/directory --path=group/student/problem/attempt --min-ngram=3 --max-ngram=10 --min-ratio=0.5 --max-clique=4

Or, if you want to save results to a file:

cheathit /submission/directory > /path/to/file

Parameters

--path

Specify the structure of the submission directory with this parameter. Use the student, group, problem, and attempt sections separated with slashes, e.g., group/group/student/problem/problem/attempt. Each subsequent section gets CheatHit one level down the directory tree; the last level must be a file containing the submission.

  • student corresponds to the set of programs submitted by an individual student;
  • group corresponds to a group of students such that cheating is likely to take place within such a group (e.g., a school class);
  • problem corresponds to a separate task shared by the students;
  • attempt corresponds to a separate submission of a student.

The student section is required (i.e., there should be at least one of these in --path); the other three sections are optional. If the same section appears in --path multiple times, CheatHit will simply concatenate its values to obtain the “true” representation of the section.

The default value of --path is student, which is suitable for cases when there is a single directory with many files, one file per student.

--min-ngram

Minimum ngram size (number of consecutive tokens) to analyze across the submissions.

The default value of --min-ngram is 1.

--max-ngram

Maximum ngram size (number of consecutive tokens) to analyze across the submissions.

The default value of --max-ngram is 20.

--min-ratio

Minimum ratio of the number of tokens shared by two submissions to the number of tokens in the longer of the submissions required so that the pair is included in the report.

The default value of --min-ratio is 0.2.

--max-clique

If an ngram (a sequence of tokens) occurs in submissions of more than --max-clique students (or in submissions of students from more than --max-clique groups), it is not considered distinctive.

The default value of --max-clique is:

  • 2 if students are assigned groups (--path includes group),
  • 5 otherwise.

Tokenization

CheatHit tokenizes source code into alphanumeric words (which can also contain underscores) and non-alphanumeric characters. Whitespace, semicolons, and commas are ignored. Two special markers, <START> and <END>, are added to the beginning and end of a token sequence. Hence,

CheatHit supports C++, Python v2 & v3, and _even_ VB.NET; awesome!

would be tokenized as

['<START>', 'CheatHit', 'supports', 'C', '+', '+', 'Python', 'v2', '&', 'v3', 'and', '_even_', 'VB', '.', 'NET', 'awesome', '!', '<END>']

Results

For each pair of students CheatHit will report how much code is shared between the students while adjusting for how distinctive the shared code is. See the Parameters and Tokenization sections above for an insight into what CheatHit considers distinctive.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cheathit-1.0.1.tar.gz (10.0 kB view details)

Uploaded Source

Built Distribution

cheathit-1.0.1-py3-none-any.whl (10.2 kB view details)

Uploaded Python 3

File details

Details for the file cheathit-1.0.1.tar.gz.

File metadata

  • Download URL: cheathit-1.0.1.tar.gz
  • Upload date:
  • Size: 10.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for cheathit-1.0.1.tar.gz
Algorithm Hash digest
SHA256 960483aba5ec6aae5ed7c1d23cff084931fb09db75e58cac47483506073fce78
MD5 8bc5c1698055566c5ab5993689fccfcc
BLAKE2b-256 5b886c6707b89d5b3350e419713d8c2e057308c5537198f1218d5282da49dcf5

See more details on using hashes here.

File details

Details for the file cheathit-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: cheathit-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 10.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for cheathit-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 223fdbbdaf551ffde864d01abcaa47beb66048c8422b391a5153736651928a20
MD5 aa07a6df0eef24c2aacf7827fea93452
BLAKE2b-256 32b96a69a983c1c343437f9a71531147cd168f1f42fb0b542cf06a9bdd84e721

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page