Skip to main content

Code Similarity (csim) is a method designed to detect similarity between source codes

Project description

Code Similarity (csim)

Code Similarity (csim) provide a module designed to detect similarities between source code files, even when obfuscation techniques have been applied. It is particularly useful for programming instructors and students who need to verify code originality.

Key Features

  • Source Code Similarity Analysis: Compares source code files to determine their degree of similarity.
  • Advanced Analysis: Utilizes parse trees and the tree edit distance algorithm for in-depth analysis.
  • Parse Trees: Represents the syntactic structure of source code, enabling detailed comparisons.
  • Tree Edit Distance: Measures the similarity between different code structures.
  • Hash-Based Pruning: Optimizes the comparison process by reducing tree size while preserving essential structure.

Technologies Used

  • Python: The core programming language for the tool.
  • ANTLR: A parser generator for creating parse trees from source code.
  • zss: A library for calculating the tree edit distance.

Installation

  1. Clone the repository:
    git clone https://github.com/EdsonEddy/csim.git
    
  2. Navigate to the project directory:
    cd csim
    
  3. Install the package:
    pip install .
    

Usage

csim can be used from the command line, for now only Python files are supported. In the future more languages will be added. For example, to compare two Python files, run:

csim -f file1.py file2.py

Alternatively, you can use csim as a Python module:

from csim import Compare
code_a = "a = 5"
code_b = "c = 50"
similarity = Compare(code_a, code_b)
print(f"Similarity: {similarity}")

ANTLR4 Instalation and Parser/Lexer Generation

This instalation is not required the files are already included in the project. But if you can review the steps to generate them yourself in the grammars/README.md file.

Contributing

Contributions are welcome! To contribute, please follow these steps:

  1. Fork the repository.
  2. Create a new branch (git checkout -b feature/new-feature).
  3. Make your changes and commit them (git commit -am 'Add new feature').
  4. Push to the branch (git push origin feature/new-feature).
  5. Open a Pull Request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Links

Additional Resources

For more information on the techniques and tools used in this project, refer to the following resources:

Third-Party Licenses

This project utilizes the following third-party libraries:

ANTLR (ANother Tool for Language Recognition)

ANTLR4-parser-for-Python-3.14 by RobEin

zss (Zhang-Shasha)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

csim-1.4.1.tar.gz (457.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

csim-1.4.1-py3-none-any.whl (94.2 kB view details)

Uploaded Python 3

File details

Details for the file csim-1.4.1.tar.gz.

File metadata

  • Download URL: csim-1.4.1.tar.gz
  • Upload date:
  • Size: 457.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for csim-1.4.1.tar.gz
Algorithm Hash digest
SHA256 82c8eeeacbb2781dcac73a611c4a20ec20598c5906ff8874e1b3ed25ea156241
MD5 659e41f17b5129e7971b277ba42936b8
BLAKE2b-256 b29dfcd3caba1065bd95b7937bb0ab375a5aecc669c031158b9a17ee930bb045

See more details on using hashes here.

File details

Details for the file csim-1.4.1-py3-none-any.whl.

File metadata

  • Download URL: csim-1.4.1-py3-none-any.whl
  • Upload date:
  • Size: 94.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for csim-1.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 06e6245d1eb4ce02d3c4e78c85ac0d067c396bfaa8ae9f28d9b9b955f9ccbbf5
MD5 2656f414d2efc248a71a7bb1332f0faa
BLAKE2b-256 0df33683ce003b0b8d3c9f6bfa5bb4964aabbb5d637172378a815d450410e4b3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page