A Source Code Similarity System
Project description
scoss
A Source Code Similarity System - SCOSS
There are four supported metrics:
count_operator: A metric that counts operators in source-code to calculate similarity score.set_operator: A metric that checks the presence of operators in source-code to calculate similarity score.hash_operator: A metric that uses the combination of adjacent operators to calculate similarity score.SMoss: A wrapper of MOSS (the same asmosspy).
Installations
This package requires python 3.6 or later.
pip install scoss
Usages
You can use SCOSS as a Command Line Interface, or a library in your project, or web-app interface
Command Line Interface (CLI)
See document by passing --help argument.
scoss --help
Usage: scoss [OPTIONS]
Options:
-i, --input-dir TEXT Input directory. [required]
-o, --output-dir TEXT Output directory.
-tc, --threshold-combination [AND|OR]
AND: All metrics are greater than threshold.
OR: At least 1 metric is greater than
threshold.
-mo, --moss FLOAT RANGE Use moss metric and set up moss threshold.
-co, --count-operator FLOAT RANGE
Use count operator metric and set up count
operator threshold.
-so, --set-operator FLOAT RANGE
Use set operator metric and set up set
operator threshold.
-ho, --hash-operator FLOAT RANGE
Use hash operator metric and set up hash
operator threshold.
--help Show this message and exit.
To get plagiarism report of a directory containing source code files, add -i/ --input-dir option. Add at least 1 similarity metric in [-mo/--moss, -co/--count-operator, -so/--set-operator, -ho/--hash-operator] and its threshold (in range [0,1]). If using 2 or more metrics, you need to define how they should be combined using -tc/--threshold-combination (AND will be used by default).
Basic command: scoss -i path/to/source_code_dir/ -tc OR -co 0.1 -ho 0.1 -mo 0.1 -o another_path/to/plagiarism_report/
Using as a library
- Define a
Scossobject and register some metrics:
from scoss import Scoss
sc = Scoss(lang='cpp')
# only show pairs that have similarity score > threshold
sc.add_metric('count_operator', threshold=0.7)
sc.add_metric('set_operator', threshold=0.5)
- Register source-codes to defined
scossobject:
sc.add_file('./tests/data/a.cpp')
sc.add_file('./tests/data/b.cpp')
sc.add_file('./tests/data/c.cpp')
# or add by wide-card
sc.add_file_by_wildcard('./tests/data/problem_A_*.cpp')
- Run
Scossand get results:
sc.run()
# filter results by combine thresholds from different metrics (and_threshold)
print(sc.get_matches(and_thresholds=True))
The same behaviours is defined in SMoss. You can create SMoss object to use MOSS system.
Web-app interface
Please check our web-app interface here.
Issues
This project is in development, if you find any issues, please create an issue here.
Contributors
Acknowledgements
This project is sponsored and led by Prof. Do Phan Thuan, Hanoi University of Science and Technology.
A part of this code adapts this source code https://github.com/soachishti/moss.py as baseline for SMoss.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scoss-0.0.5-py3-none-any.whl.
File metadata
- Download URL: scoss-0.0.5-py3-none-any.whl
- Upload date:
- Size: 25.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
99c09e2185a742c68b5ad7cc5f2218e6199b74ccad846146276b320dc164cfd9
|
|
| MD5 |
b4d585141a64abdce71942c0b5345fdf
|
|
| BLAKE2b-256 |
0b7f35d6bb538ccec22bb2e08d87ccdacef63f981f8eecf5b01d50010dfa4f1c
|