Skip to main content

Matheel: A CLI and Python package for source-code similarity detection.

Project description

This is the repository for the demonstration paper "Matheel: A Hybrid Source Code Plagiarism Detection Software".

Matheel

Matheel is a Python package designed to detect source code similarity. It integrates semantic similarity models with traditional edit distance, providing a robust approach to detecting similarities among source code snippets.


Features

  • Semantic Similarity: Uses Pre-Trained models.
  • Edit-distance Metrics: Integrates Levenshtein and Jaro-Winkler similarity scores.
  • Combined Weighted Similarity: Adjustable weights for semantic and syntactic similarity.
  • Easy CLI & Python API: Suitable for both interactive and automated workflows.
  • Interactive UI: Gradio-based user interface.

Installation

Install via pip:

pip install matheel

Usage

CLI Usage

Compare files within a compressed ZIP archive:

matheel compare codes.zip --model buelfhood/unixcoder-base-unimodal-ST --threshold 0.5 --num 100

Python API Usage

To calculate similarities programmatically:

from matheel.similarity import get_sim_list

# Define parameters
zip_file = "sample_codes.zip"
Ws, Wl, Wj = 0.7, 0.2, 0.1
model_name = "buelfhood/unixcoder-base-unimodal-ST"
threshold = 0.5
number_results = 100

# Get similarity results
results = get_sim_list(zip_file, Ws, Wl, Wj, model_name, threshold, number_results)

# Display results
print(results)

Gradio GUI:

The gradio_app folder contains a notebook that allows you to run the Gradio through a Jupyter Notebook. Also, a demo is available hosted on Huggingface Spaces.

Using Gradio API:

The tool can be used through the Gradio API as per the following call:

#pip install gradio_client
from gradio_client import Client, handle_file

client = Client("buelfhood/Matheel")
result = client.predict(
		zipped_file=handle_file('zip file path'),
		Ws=0.7,
		Wl=0.3,
		Wj=0,
		model_name="buelfhood/unixcoder-base-unimodal-ST",
		threshold=0,
		number_results=10,
		api_name="/get_sim_list"
)
print(result)

License

This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License.


Acknowledgement:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matheel-0.1.6.tar.gz (3.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

matheel-0.1.6-py3-none-any.whl (4.4 kB view details)

Uploaded Python 3

File details

Details for the file matheel-0.1.6.tar.gz.

File metadata

  • Download URL: matheel-0.1.6.tar.gz
  • Upload date:
  • Size: 3.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.5

File hashes

Hashes for matheel-0.1.6.tar.gz
Algorithm Hash digest
SHA256 a74889a350df79f3725ce0838c6fd8c6a13503e9b33c6725d851d9c6c66410e8
MD5 b7fd21f27fde5e8c1c8103112963129b
BLAKE2b-256 f02c32072f18ecca7efaada213ac06a6f11cdcb9274df3f84781e804b2cf548e

See more details on using hashes here.

File details

Details for the file matheel-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: matheel-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 4.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.5

File hashes

Hashes for matheel-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 67f1a6116c0ae776c88f8c764011dfc718f949b2f50bfda70dfc1e8c10913c79
MD5 3c915cc7ee322746f782886198dfa3d4
BLAKE2b-256 0d53c37f9f86cc0669c4089cb1f674c7f0654c4056a3f17832a81f3b490af3b2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page