Skip to main content

A Source Code Similarity System

Project description

scoss

A Source Code Similarity System - SCOSS

There are four supported metrics:

  • count_operator: A metric that counts operators in source-code to calculate similarity score.
  • set_operator: A metric that checks the presence of operators in source-code to calculate similarity score.
  • hash_operator: A metric that uses the combination of adjacent operators to calculate similarity score.
  • SMoss: A wrapper of MOSS (the same as mosspy).

Installations

This package requires python 3.6 or later.

pip install scoss

Usages

You can use SCOSS as a Command Line Interface, or a library in your project, or web-app interface

Command Line Interface (CLI)

See document by passing --help argument.

scoss --help
Usage: scoss [OPTIONS]

Options:
  -i, --input-dir TEXT      Input directory.  [required]
  -o, --output-dir TEXT           Output directory.
  -tc, --threshold-combination [AND|OR]
                                  AND: All metrics are greater than threshold.
                                  OR: At least 1 metric is greater than
                                  threshold.

  -mo, --moss FLOAT RANGE         Use moss metric and set up moss threshold.
  -co, --count-operator FLOAT RANGE
                                  Use count operator metric and set up count
                                  operator threshold.

  -so, --set-operator FLOAT RANGE
                                  Use set operator metric and set up set
                                  operator threshold.

  -ho, --hash-operator FLOAT RANGE
                                  Use hash operator metric and set up hash
                                  operator threshold.

  --help                          Show this message and exit.

To get plagiarism report of a directory containing source code files, add -i/ --input-dir option. Add at least 1 similarity metric in [-mo/--moss, -co/--count-operator, -so/--set-operator, -ho/--hash-operator] and its threshold (in range [0,1]). If using 2 or more metrics, you need to define how they should be combined using -tc/--threshold-combination (AND will be used by default).

Basic command: scoss -i path/to/source_code_dir/ -tc OR -co 0.1 -ho 0.1 -mo 0.1 -o another_path/to/plagiarism_report/

Using as a library

  1. Define a Scoss object and register some metrics:
from scoss import Scoss
sc = Scoss(lang='cpp')
# only show pairs that have similarity score > threshold
sc.add_metric('count_operator', threshold=0.7) 
sc.add_metric('set_operator', threshold=0.5)
  1. Register source-codes to defined scoss object:
sc.add_file('./tests/data/a.cpp')
sc.add_file('./tests/data/b.cpp')
sc.add_file('./tests/data/c.cpp')
# or add by wide-card
sc.add_file_by_wildcard('./tests/data/problem_A_*.cpp')
  1. Run Scoss and get results:
sc.run()
# filter results by combine thresholds from different metrics (and_threshold)
print(sc.get_matches(and_thresholds=True))

The same behaviours is defined in SMoss. You can create SMoss object to use MOSS system.

Web-app interface

Please check our web-app interface here.

Issues

This project is in development, if you find any issues, please create an issue here.

Contributors

Ngoc Bui, Thai Do, Tran Vien.

Acknowledgements

This project is sponsored and led by Prof. Do Phan Thuan, Hanoi University of Science and Technology.

A part of this code adapts this source code https://github.com/soachishti/moss.py as baseline for SMoss.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

scoss-0.0.5-py3-none-any.whl (25.7 kB view details)

Uploaded Python 3

File details

Details for the file scoss-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: scoss-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 25.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.6

File hashes

Hashes for scoss-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 99c09e2185a742c68b5ad7cc5f2218e6199b74ccad846146276b320dc164cfd9
MD5 b4d585141a64abdce71942c0b5345fdf
BLAKE2b-256 0b7f35d6bb538ccec22bb2e08d87ccdacef63f981f8eecf5b01d50010dfa4f1c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page