Skip to main content

A tool to mine a GitHub repository and obtain a dataset containing a list of bug-fix pairs and related information.

Project description

MineCPP

MineCPP also known as Minecraft++ is an extension of Minecraft

Introduction

MineCPP - A tool to mine a GitHub repository and obtain a dataset containing a list of bug-fix pairs and related information. The tool, with the argument -U [GitHub URL], mines the repository and provides the output project_name.csv. The schema of project_name.csv contains 17 columns and each row in it represents a potential bug-fix pair.

Getting Started

MineCPP is a python based tool. Make sure python is installed before following the Installation guide.

Installation

MineCPP can be installed with a simple pip command.

# Installation command
pip install minecpp

All the dependencies are taken care by the installation.

Usage

MineCPP comes with three optional arguments:

optional arguments:
  -h, --help  show this help message and exit  
  --version   show version number and exit  
  -u U        Provide the GitHub repo link to anlyse the repository

A GitHub URL of the repository is enough to perform analysis on the repo. Command to run it on repository is:

minecpp -u https://github.com/SET-IITGN/Minecraft

Tool's Output

The output of the tool is a project_name.csv file. The schema of the file is:

  • 'Before Bug fix': Represents the code snippet containing a bug.
  • 'After Bug fix': Represents the code snippet after the bug is fixed.
  • 'Location': Represents the line numbers. The 'after' field represents the line number where the bug is fixed, and 'before' represents the line number where the bug was found.
  • 'Bug type': Represents the type of bug obtained from LLM using the git diff between the fixed commit and the buggy commit.
  • 'Commit Message': Represents the author's description of the commit.
  • 'File Path': Represents the path of the file in which the change is present or the bug is fixed.
  • 'Test File': Denotes whether the test file is present for the bug. Here, 1 represents that the test file is present, and 0 represents that the test file is absent.
  • 'Coding Effort': Represents the effort an author makes before a bug occurs (obtained from the AST of the source code).
  • 'Constructs': Represents the type of constructs in which the bug occurred.
  • 'Lizard Features Buggy': Denotes the cyclomatic complexity of the buggy file.
  • 'Lizard Features Fixed': Denotes the cyclomatic complexity of the bug-fix file.
  • 'BLEU', 'crystalBLEU_score', 'bert_score': Represent three different algorithms that estimate the similarity between buggy and fixed code. The similarity score lies in the range 0 to 1, where 1 indicates similarity, and 0 indicates dissimilarity.

Tool's GUI

The tool also provides a GUI to explore and analyse the dataset. It provides two features

  • Dataset Visualization: This feature is used to view the dataset and it is interactive.
  • Quantitative Analysis: This feature is used to show the quantative analysis of Coding Effort vs Bug-Fix pairs and Similarity Score vs Bug-Fix pairs.

Configuration

Python 3.8 or above
Needs C++ 14

Contributing

Conrtibutions are accepted. The contributions will be accepted only if they are suitable for the tool.

License

Apache License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Minecpp-0.4.tar.gz (2.6 MB view hashes)

Uploaded Source

Built Distribution

Minecpp-0.4-py3-none-any.whl (2.4 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page