Skip to main content

A Source Code Similarity System

Project description

scoss

A Source Code Similarity System - SCOSS

There are four supported metrics:

  • count_operator: A metric that counts operators in source-code to calculate similarity score.
  • set_operator: A metric that checks the presence of operators in source-code to calculate similarity score.
  • hash_operator: A metric that uses the combination of adjacent operators to calculate similarity score.
  • SMoss: A wrapper of MOSS (the same as mosspy).

Installations

This package requires python 3.6 or later.

pip install scoss

Usages

You can use SCOSS as a Command Line Interface, or a library in your project, or web-app interface

Command Line Interface (CLI)

See document by passing --help argument.

scoss --help
Usage: scoss [OPTIONS]

Options:
  -sd, --submission-dir TEXT      Submission directory.  [required]
  -o, --output-dir TEXT           Output directory.
  -tc, --threshold-combination [AND|OR]
                                  AND: All metrics are greater than threshold.
                                  OR: At least 1 metric is greater than
                                  threshold.

  -mo, --moss FLOAT RANGE         Use moss metric and set up moss threshold.
  -co, --count-operator FLOAT RANGE
                                  Use count operator metric and set up count
                                  operator threshold.

  -so, --set-operator FLOAT RANGE
                                  Use set operator metric and set up set
                                  operator threshold.

  -ho, --hash-operator FLOAT RANGE
                                  Use hash operator metric and set up hash
                                  operator threshold.

  --help                          Show this message and exit.

To get plagiarism report of a submission directory, add -sd/ --submission-dir option. Add at least 1 similarity metric in [-mo/--moss, -co/--count-operator, -so/--set-operator, -ho/--hash-operator] and its threshold (in range [0,1]). If using 2 or more metrics, you need to define how they should be combined using -tc/--threshold-combination (AND will be used by default).

Basic command: scoss -sd tests/data/299721 -tc OR -co 0.1 -ho 0.1 -mo 0.1 -o tests/data

Using as a library

  1. Define a Scoss object and register some metrics:
from scoss import Scoss
sc = Scoss(lang='cpp')
# only show pairs that have similarity score > threshold
sc.add_metric('count_operator', threshold=0.7) 
sc.add_metric('set_operator', threshold=0.5)
  1. Register source-codes to defined scoss object:
sc.add_file('./tests/data/a.cpp')
sc.add_file('./tests/data/b.cpp')
sc.add_file('./tests/data/c.cpp')
# or add by wide-card
sc.add_file_by_wildcard('./tests/data/problem_A_*.cpp')
  1. Run Scoss and get results:
sc.run()
# filter results by combine thresholds from different metrics (and_threshold)
print(sc.get_matches(and_thresholds=True))

The same behaviours is defined in SMoss. You can create SMoss object to use MOSS system.

Web-app interface

Please check our web-app interface here.

Issues

This project is in development, if you find any issues, please create an issue here.

Contributors

Ngoc Bui, Thai Do, Tran Vien.

Acknowledgements

This project is sponsored and led by Prof. Do Phan Thuan, Hanoi University of Science and Technology.

A part of this code adapts this source code https://github.com/soachishti/moss.py as baseline for SMoss.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scoss-0.0.3.tar.gz (19.0 kB view details)

Uploaded Source

Built Distribution

scoss-0.0.3-py3-none-any.whl (24.4 kB view details)

Uploaded Python 3

File details

Details for the file scoss-0.0.3.tar.gz.

File metadata

  • Download URL: scoss-0.0.3.tar.gz
  • Upload date:
  • Size: 19.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.2

File hashes

Hashes for scoss-0.0.3.tar.gz
Algorithm Hash digest
SHA256 22cfa861d1b4c2e3a1ad84ee93b29ac835d1e0c8086e688d86a2ecfcad41f6d0
MD5 54f7e3c6ab5866c5bb1d60248976c951
BLAKE2b-256 7e68ac464466b4fce77b0fd2e4914a5f0f1bb46c62a3f85e151fc86ebbc0bdcf

See more details on using hashes here.

File details

Details for the file scoss-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: scoss-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 24.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.2

File hashes

Hashes for scoss-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 bb1e8c77a67bb7b0bbad1dc1a5d4542447f00ec06761da6907a4a2811caefc47
MD5 722db6a375d86575933f9edf9d545833
BLAKE2b-256 84d72adbd052bee9cb4f962e0b566a01ef7d463856c1ec578b5ffef0ac44b9b9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page