Skip to main content

A Source Code Similarity System

Project description

scoss

A Source Code Similarity System - SCOSS

There are four supported metrics:

  • count_operator: A metric that counts operators in source-code to calculate similarity score.
  • set_operator: A metric that checks the presence of operators in source-code to calculate similarity score.
  • hash_operator: A metric that uses the combination of adjacent operators to calculate similarity score.
  • SMoss: A wrapper of MOSS (the same as mosspy).

Installations

This package requires python 3.6 or later.

pip install scoss

Usages

You can use SCOSS as a Command Line Interface, or a library in your project, or web-app interface

Command Line Interface (CLI)

See document by passing --help argument.

scoss --help
Usage: scoss [OPTIONS]

Options:
  -i, --input-dir TEXT      Input directory.  [required]
  -o, --output-dir TEXT           Output directory.
  -tc, --threshold-combination [AND|OR]
                                  AND: All metrics are greater than threshold.
                                  OR: At least 1 metric is greater than
                                  threshold.

  -mo, --moss FLOAT RANGE         Use moss metric and set up moss threshold.
  -co, --count-operator FLOAT RANGE
                                  Use count operator metric and set up count
                                  operator threshold.

  -so, --set-operator FLOAT RANGE
                                  Use set operator metric and set up set
                                  operator threshold.

  -ho, --hash-operator FLOAT RANGE
                                  Use hash operator metric and set up hash
                                  operator threshold.

  --help                          Show this message and exit.

To get plagiarism report of a directory containing source code files, add -i/ --input-dir option. Add at least 1 similarity metric in [-mo/--moss, -co/--count-operator, -so/--set-operator, -ho/--hash-operator] and its threshold (in range [0,1]). If using 2 or more metrics, you need to define how they should be combined using -tc/--threshold-combination (AND will be used by default).

Basic command: scoss -i path/to/source_code_dir/ -tc OR -co 0.1 -ho 0.1 -mo 0.1 -o another_path/to/plagiarism_report/

Using as a library

  1. Define a Scoss object and register some metrics:
from scoss import Scoss
sc = Scoss(lang='cpp')
# only show pairs that have similarity score > threshold
sc.add_metric('count_operator', threshold=0.7) 
sc.add_metric('set_operator', threshold=0.5)
  1. Register source-codes to defined scoss object:
sc.add_file('./tests/data/a.cpp')
sc.add_file('./tests/data/b.cpp')
sc.add_file('./tests/data/c.cpp')
# or add by wide-card
sc.add_file_by_wildcard('./tests/data/problem_A_*.cpp')
  1. Run Scoss and get results:
sc.run()
# filter results by combine thresholds from different metrics (and_threshold)
print(sc.get_matches(and_thresholds=True))

The same behaviours is defined in SMoss. You can create SMoss object to use MOSS system.

Web-app interface

Please check our web-app interface here.

Issues

This project is in development, if you find any issues, please create an issue here.

Contributors

Ngoc Bui, Thai Do, Tran Vien.

Acknowledgements

This project is sponsored and led by Prof. Do Phan Thuan, Hanoi University of Science and Technology.

A part of this code adapts this source code https://github.com/soachishti/moss.py as baseline for SMoss.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scoss-0.0.4.tar.gz (21.1 kB view details)

Uploaded Source

Built Distribution

scoss-0.0.4-py3-none-any.whl (25.8 kB view details)

Uploaded Python 3

File details

Details for the file scoss-0.0.4.tar.gz.

File metadata

  • Download URL: scoss-0.0.4.tar.gz
  • Upload date:
  • Size: 21.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4

File hashes

Hashes for scoss-0.0.4.tar.gz
Algorithm Hash digest
SHA256 89efb4684bb1aa5ba7dd5201963110f1132753c5ccb4238edca7a0108facff4d
MD5 b6149108c0f98083bef59ab5f72a9d35
BLAKE2b-256 ddd25a1e3965bbb64804236eb59a0278d47b4cec364000912c2d5b0c2ab4a311

See more details on using hashes here.

File details

Details for the file scoss-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: scoss-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 25.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4

File hashes

Hashes for scoss-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 994057cc76ecb6733df076ea8af9e65640daa9504116a6bda44f978f96723522
MD5 56d6cd8d18b0b53a59ed4538d19de8b5
BLAKE2b-256 eaa6aeb2542bf43e91abbb3baffaa80b8a011c76cc99c256206812a8d8acebe3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page