A Source Code Similarity System
Project description
scoss
A Source Code Similarity System - SCOSS
There are four supported metrics:
count_operator
: A metric that counts operators in source-code to calculate similarity score.set_operator
: A metric that checks the presence of operators in source-code to calculate similarity score.hash_operator
: A metric that uses the combination of adjacent operators to calculate similarity score.SMoss
: A wrapper of MOSS (the same asmosspy
).
Installations
This package requires python 3.6
or later.
pip install scoss
Usages
You can use SCOSS as a Command Line Interface, or a library in your project, or web-app interface
Command Line Interface (CLI)
See document by passing --help
argument.
scoss --help
Usage: scoss [OPTIONS]
Options:
-sd, --submission-dir TEXT Submission directory. [required]
-o, --output-dir TEXT Output directory.
-tc, --threshold-combination [AND|OR]
AND: All metrics are greater than threshold.
OR: At least 1 metric is greater than
threshold.
-mo, --moss FLOAT RANGE Use moss metric and set up moss threshold.
-co, --count-operator FLOAT RANGE
Use count operator metric and set up count
operator threshold.
-so, --set-operator FLOAT RANGE
Use set operator metric and set up set
operator threshold.
-ho, --hash-operator FLOAT RANGE
Use hash operator metric and set up hash
operator threshold.
--help Show this message and exit.
To get plagiarism report of a submission directory, add -sd/ --submission-dir
option. Add at least 1 similarity metric in [-mo/--moss
, -co/--count-operator
, -so/--set-operator
, -ho/--hash-operator
] and its threshold (in range [0,1]). If using 2 or more metrics, you need to define how they should be combined using -tc/--threshold-combination
(AND
will be used by default).
Basic command: scoss -sd tests/data/299721 -tc OR -co 0.1 -ho 0.1 -mo 0.1 -o tests/data
Using as a library
- Define a
Scoss
object and register some metrics:
from scoss import Scoss
sc = Scoss(lang='cpp')
# only show pairs that have similarity score > threshold
sc.add_metric('count_operator', threshold=0.7)
sc.add_metric('set_operator', threshold=0.5)
- Register source-codes to defined
scoss
object:
sc.add_file('./tests/data/a.cpp')
sc.add_file('./tests/data/b.cpp')
sc.add_file('./tests/data/c.cpp')
# or add by wide-card
sc.add_file_by_wildcard('./tests/data/problem_A_*.cpp')
- Run
Scoss
and get results:
sc.run()
# filter results by combine thresholds from different metrics (and_threshold)
print(sc.get_matches(and_thresholds=True))
The same behaviours is defined in SMoss
. You can create SMoss
object to use MOSS system.
Web-app interface
Please check our web-app interface here.
Issues
This project is in development, if you find any issues, please create an issue here.
Contributors
Acknowledgements
This project is sponsored and led by Prof. Do Phan Thuan, Hanoi University of Science and Technology.
A part of this code adapts this source code https://github.com/soachishti/moss.py as baseline for SMoss
.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file scoss-0.0.3.tar.gz
.
File metadata
- Download URL: scoss-0.0.3.tar.gz
- Upload date:
- Size: 19.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 22cfa861d1b4c2e3a1ad84ee93b29ac835d1e0c8086e688d86a2ecfcad41f6d0 |
|
MD5 | 54f7e3c6ab5866c5bb1d60248976c951 |
|
BLAKE2b-256 | 7e68ac464466b4fce77b0fd2e4914a5f0f1bb46c62a3f85e151fc86ebbc0bdcf |
File details
Details for the file scoss-0.0.3-py3-none-any.whl
.
File metadata
- Download URL: scoss-0.0.3-py3-none-any.whl
- Upload date:
- Size: 24.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bb1e8c77a67bb7b0bbad1dc1a5d4542447f00ec06761da6907a4a2811caefc47 |
|
MD5 | 722db6a375d86575933f9edf9d545833 |
|
BLAKE2b-256 | 84d72adbd052bee9cb4f962e0b566a01ef7d463856c1ec578b5ffef0ac44b9b9 |