Code Similarity (csim) is a method designed to detect similarity between source codes
Project description
Code Similarity (csim)
Code Similarity (csim) provide a module designed to detect similarities between source code files, even when obfuscation techniques have been applied. It is particularly useful for programming instructors and students who need to verify code originality.
Key Features
- Source Code Similarity Analysis: Compares source code files to determine their degree of similarity.
- Advanced Analysis: Utilizes parse trees and the tree edit distance algorithm for in-depth analysis.
- Parse Trees: Represents the syntactic structure of source code, enabling detailed comparisons.
- Tree Edit Distance: Measures the similarity between different code structures.
Technologies Used
- Python: The core programming language for the tool.
- ANTLR: A parser generator for creating parse trees from source code.
- zss: A library for calculating the tree edit distance.
Installation
- Clone the repository:
git clone https://github.com/EdsonEddy/csim.git
- Navigate to the project directory:
cd csim
- Install the package:
pip install .
Usage
csim can be used from the command line, for now only Python files are supported. In the future more languages will be added. For example, to compare two Python files, run:
csim -f file1.py file2.py
Alternatively, you can use csim as a Python module:
from csim import Compare
code_a = "a = 5"
code_b = "c = 50"
similarity = Compare(code_a, code_b)
print(f"Similarity: {similarity}")
ANTLR4 Instalation and Parser/Lexer Generation
This instalation is not required the files are already included in the project. But if you can review the steps to generate them yourself in the grammars/README.md file.
Contributing
Contributions are welcome! To contribute, please follow these steps:
- Fork the repository.
- Create a new branch (
git checkout -b feature/new-feature). - Make your changes and commit them (
git commit -am 'Add new feature'). - Push to the branch (
git push origin feature/new-feature). - Open a Pull Request.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Links
Additional Resources
For more information on the techniques and tools used in this project, refer to the following resources:
Third-Party Licenses
This project utilizes the following third-party libraries:
ANTLR (ANother Tool for Language Recognition)
- Purpose: A parser generator used to create parse trees from source code.
- License: BSD 3-Clause
- Website: https://www.antlr.org/
- Repository: https://github.com/antlr/antlr4
ANTLR4-parser-for-Python-3.14 by RobEin
- Purpose: Python 3.14 grammar for ANTLR4
- License: MIT License
- Repository: https://github.com/RobEin/ANTLR4-parser-for-Python-3.14
zss (Zhang-Shasha)
- Purpose: Tree edit distance algorithm implementation for comparing tree structures
- License: MIT License
- Repository: https://github.com/timtadh/zhang-shasha
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file csim-1.3.2.tar.gz.
File metadata
- Download URL: csim-1.3.2.tar.gz
- Upload date:
- Size: 456.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
03fb4fce5ebbb9c454f53a56192a9c0c9a97fca171655a0d5c3617c4c4339561
|
|
| MD5 |
037d2343ceda39ba74ebaebd8c0fba04
|
|
| BLAKE2b-256 |
2199718ad35eb13080d352b170720f12c7563d41a13c277be41cb978c1eb9eb9
|
File details
Details for the file csim-1.3.2-py3-none-any.whl.
File metadata
- Download URL: csim-1.3.2-py3-none-any.whl
- Upload date:
- Size: 93.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7b3e9758db31245dec485431df1813c72f17cab28d737604d57b5a5d7cfca88f
|
|
| MD5 |
e6ae7ca0c22bdcb9f2e403743be3a059
|
|
| BLAKE2b-256 |
b1d461d91cf4ca601d48b0ffa63092d9908f308d23e1ea8c3a9d84991d5db44c
|