Skip to main content

Matheel: A CLI and Python package for source-code similarity detection.

Project description

This is the repository for the demonstration paper "Matheel: A Hybrid Source Code Plagiarism Detection Software".

Matheel

Matheel is a Python package designed to detect source code similarity. It integrates semantic similarity models with traditional edit distance, providing a robust approach to detecting similarities among source code snippets.


Features

  • Semantic Similarity: Uses Pre-Trained models.
  • Edit-distance Metrics: Integrates Levenshtein and Jaro-Winkler similarity scores.
  • Combined Weighted Similarity: Adjustable weights for semantic and syntactic similarity.
  • Easy CLI & Python API: Suitable for both interactive and automated workflows.
  • Interactive UI: Gradio-based user interface.

Installation

Install via pip:

pip install matheel

Usage

CLI Usage

Compare files within a compressed ZIP archive:

matheel compare codes.zip --model buelfhood/unixcoder-base-unimodal-ST --threshold 0.5 --num 100

Python API Usage

To calculate similarities programmatically:

from matheel.similarity import get_sim_list

# Define parameters
zip_file = "sample_codes.zip"
Ws, Wl, Wj = 0.7, 0.2, 0.1
model_name = "buelfhood/unixcoder-base-unimodal-ST"
threshold = 0.5
number_results = 100

# Get similarity results
results = get_sim_list(zip_file, Ws, Wl, Wj, model_name, threshold, number_results)

# Display results
print(results)

Gradio GUI:

The gradio_app folder contains a notebook that allows you to run the Gradio through a Jupyter Notebook. Also, a demo is available hosted on Huggingface Spaces.

Using Gradio API:

The tool can be used through the Gradio API as per the following call:

#pip install gradio_client
from gradio_client import Client, handle_file

client = Client("buelfhood/Matheel")
result = client.predict(
		zipped_file=handle_file('zip file path'),
		Ws=0.7,
		Wl=0.3,
		Wj=0,
		model_name="buelfhood/unixcoder-base-unimodal-ST",
		threshold=0,
		number_results=10,
		api_name="/get_sim_list"
)
print(result)

License

This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License.


Acknowledgement:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matheel-0.1.8.tar.gz (10.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

matheel-0.1.8-py3-none-any.whl (10.7 kB view details)

Uploaded Python 3

File details

Details for the file matheel-0.1.8.tar.gz.

File metadata

  • Download URL: matheel-0.1.8.tar.gz
  • Upload date:
  • Size: 10.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for matheel-0.1.8.tar.gz
Algorithm Hash digest
SHA256 eeee48eff393876372ce18acc4704ba6e8da72016647cb9c11cd7a639d999820
MD5 a8d551254493f807bfa4b9276661619e
BLAKE2b-256 a80dad2b9615a349ba7cb587777fa8876b029eb7d7066b1641b0eeec7da31743

See more details on using hashes here.

File details

Details for the file matheel-0.1.8-py3-none-any.whl.

File metadata

  • Download URL: matheel-0.1.8-py3-none-any.whl
  • Upload date:
  • Size: 10.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for matheel-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 2a73e76626da59a2d01c15fdf49fac9d11eb29328407e759558464c9105f6704
MD5 481b6efcffd23f057e8f6231681f19dc
BLAKE2b-256 880c544ce1ee77645712b83e449009aadee21b2268cf0a766dd74835a9ebfc56

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page