Skip to main content

Matheel: A CLI and Python package for source-code similarity detection.

Project description

This is the repository for the demonstration paper "Matheel: A Hybrid Source Code Plagiarism Detection Software".

Matheel

Matheel is a Python package designed to detect source code similarity. It integrates semantic similarity models with traditional edit distance, providing a robust approach to detecting similarities among source code snippets.


Features

  • Semantic Similarity: Uses Pre-Trained models.
  • Edit-distance Metrics: Integrates Levenshtein and Jaro-Winkler similarity scores.
  • Combined Weighted Similarity: Adjustable weights for semantic and syntactic similarity.
  • Easy CLI & Python API: Suitable for both interactive and automated workflows.
  • Interactive UI: Gradio-based user interface.

Installation

Install via pip:

pip install matheel

Usage

CLI Usage

Compare files within a compressed ZIP archive:

matheel compare codes.zip --model buelfhood/unixcoder-base-unimodal-ST --threshold 0.5 --num 100

Python API Usage

To calculate similarities programmatically:

from matheel.similarity import get_sim_list

# Define parameters
zip_file = "sample_codes.zip"
Ws, Wl, Wj = 0.7, 0.2, 0.1
model_name = "buelfhood/unixcoder-base-unimodal-ST"
threshold = 0.5
number_results = 100

# Get similarity results
results = get_sim_list(zip_file, Ws, Wl, Wj, model_name, threshold, number_results)

# Display results
print(results)

Gradio GUI:

The gradio_app folder contains a notebook that allows you to run the Gradio through a Jupyter Notebook. Also, a demo is available hosted on Huggingface Spaces.

Using Gradio API:

The tool can be used through the Gradio API as per the following call:

#pip install gradio_client
from gradio_client import Client, handle_file

client = Client("buelfhood/Matheel")
result = client.predict(
		zipped_file=handle_file('zip file path'),
		Ws=0.7,
		Wl=0.3,
		Wj=0,
		model_name="buelfhood/unixcoder-base-unimodal-ST",
		threshold=0,
		number_results=10,
		api_name="/get_sim_list"
)
print(result)

License

This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License.


Acknowledgement:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matheel-0.1.7.tar.gz (10.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

matheel-0.1.7-py3-none-any.whl (10.7 kB view details)

Uploaded Python 3

File details

Details for the file matheel-0.1.7.tar.gz.

File metadata

  • Download URL: matheel-0.1.7.tar.gz
  • Upload date:
  • Size: 10.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for matheel-0.1.7.tar.gz
Algorithm Hash digest
SHA256 2790015790c85e322977a84907957c893e4552c86dae7e3f8f4b76989058b199
MD5 d334fdb3131cfa0db68ba45a6df371c1
BLAKE2b-256 052b14e3886a994c051686f4788f92d955823a8121def2705b32e11992c73e9c

See more details on using hashes here.

File details

Details for the file matheel-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: matheel-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 10.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for matheel-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 831b6220222819607ff8e88996b440edea71ce98855210e879c6a3527e4c859d
MD5 59d42b7e6aca911e832caf24d3e07fa7
BLAKE2b-256 6abd2d26b5e4edd3f4b79534e93aa009dadb7b4f75aca4689a535676accfd89a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page