Skip to main content

A package to facilitate data-wrangling for APR tools

Project description

PyPI Run Tests and Deploy Downloads

Introduction

Pyrepair Benchmark Wrangling is a Python package designed to facilitate and streamline the process of the data-wrangling benchmarks for Automated Program Repair (APR) tools. This package provides an easy-to-use command-line interface to interact with two main components:

Features

  • lmeasures: A command-line tool to compute and report various metrics and measures for the benchmarks.
  • bgp: A command-line interface to interact with the BugsInPy benchmark suite.
  • sample_bip: A utility to sample bugs from the BugsInPy benchmark suite.
  • run_custom_patch: A tool to apply custom patches to the bugs in the BugsInPy dataset.
  • diff_utils: A command-line utility to handle diff files and changes.

Installation

You can either directly install Pyr Benchmark Wrangling via pip, or Docker. After cloning the repository, switch to pyr_benchmark_wrangling:

cd pyr_benchmark_wrangling

Direct Installation

System Requirements:

Before using the Pyr Benchmark Wrangling, make sure your system meets the following system requirements:

  • Python 3.7 and Python 3.8
  • Development packages for Python 3.7 and Python 3.8
  • libffi7 library

On a Debian-based system, you can install these requirements using apt-get:

sudo apt-get install python3.7 python3.7-dev python3.8 python3.8-dev libffi7

Pip Command:

You can install Pyr Benchmark Wrangling by running the following command:

pip install .

Using Docker

Pyr Benchmark Wrangling's Docker Space Requirements:

  • Lite image: 2.8 GB
  • Full image: 20 GB

The difference between lite and full image is that the virtual environments are lazily constructed in lite, and downloaded in full.

To build the lite image (2.8 GB), use the following command:

docker build --target lite -t pyr:lite .

This will execute all instructions in the Dockerfile up until the lite stage is complete. The lite image automatically runs update_bug_records and clone's all repositories

To build the full image (20 GB), use the following command:

docker build --target full -t pyr:full .

This will execute all instructions in the Dockerfile. The full image automatically runs update_bug_records, clone's all repositories, installs all required environments.

BugsInPy CLI

The BugsInPy CLI is a command-line tool designed for interacting with and running Python bugs from the BugsInPy dataset. This script streamlines the process of setting up bug repositories, cloning specific bugs or repositories, preparing the environment, running tests, and more. Below, you'll find an overview of the available commands and their functionalities: This tool requires Python3.10 and above

Usage

setup Command

The setup command is used to set up the BugsInPy repository. This step is essential before working with any bugs. It clones the BugsInPy repository to your local system.

bgp setup

clone Command

The clone command allows you to clone specific bugs or repositories based on your requirements. You can specify the bugs to clone using the --bug_list flag or repositories using the --repo_list flag.

Example:

bgp clone --bugids repo1:id1,repo2:id2,...,repo3:id3

checkout_buggy and checkout_fixed Commands

These commands are used to checkout the buggy or fixed version of a specific bug repository. You provide the bug ID to identify the repository.

  • To checkout the buggy version:
bgp checkout_buggy --bugids repo:<bug_id>
  • To checkout the fixed version:
bgp checkout_fixed --bugids repo:<bug_id>

extract_features Command

The extract_features command extracts features of a specific bug.

Example:

bgp extract_features --bugids repo:<bug_id>

prep Command

The prep command prepares the environment for a specific bug. It installs the required dependencies and performs sanity checks to ensure the bug can be tested successfully. The commands setup and clone should be run before prep

Example:

bgp prep --bugids repo:<bug_id>

run_test Command

The run_test command runs the tests for a specific list of bugs. It executes the test commands associated with the bug. The commands setup, clone and prep should be run before run_test.

bgp run_test --bugids repo:<bug_id>

delete_bug_repo Command

The delete_bug_repo command deletes a specific bug repository from your local system.

Example:

bgp delete_bug_repo <bug_id>

Additional Notes

  • The CLI provides options to control the verbosity of the prep step and set the log level.
  • Mutually exclusive flags such as --bug_list, --repo_list are available for listing bugs/repos on which the commands should be run.
  • You can adjust the timeout for various system calls using the --timeout flag.

Unsupported repos

The following repositories are un-supported:

  • Spacy: Due to the requirement of python version < "3.4"

diff_utils

diff_utils is a Python module for analyzing and extracting data from unified diff outputs generated by tools such as Git. The module provides functionalities to compute localization measures on single file diffs, across multiple file diffs, and extract the modified line numbers and file names from diffs.

Features

  • Compute hunk statistics such as count, gaps, and spans from a single file diff.
  • Aggregate hunk information across multiple file diffs to calculate comprehensive statistics.
  • Extract modified files and their respective line changes from a unified diff.
  • Write the extracted data to CSV files for further analysis.

Usage

The diff_utils module provides a set of functions that can be used independently or through a command-line interface.

Command-line Interface

The module can be run as a script to perform actions based on the arguments provided:

  • --measure: Accepts comma-delimited list of diff files to measure localization metrics.
  • --locations: Accepts comma-delimited list of diff files from which to extract location sets.
  • --quiet: Quiet mode, which suppresses the standard output.
  • --output: Specifies the output CSV file name.

Example Usage:

> diff_utils --measure "diff_file1.txt,diff_file2.txt" --output "measures_output.csv"
> diff_utils --locations "diff_file1.txt,diff_file2.txt" --output "locations_output.csv"

Module Functions

You can also use the functions provided by diff_utils in a Python script:

from diffutils import measure_localisation_diff_file, locations_from_diff_file

# Measure localization metrics for a given diff file
metrics = measure_localisation_diff_file("diff_file.txt")

# Extract locations from a diff file
locations = locations_from_diff_file("diff_file.txt")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyr_benchmark_wrangling-0.0.6.tar.gz (48.3 kB view details)

Uploaded Source

Built Distribution

pyr_benchmark_wrangling-0.0.6-py3-none-any.whl (51.6 kB view details)

Uploaded Python 3

File details

Details for the file pyr_benchmark_wrangling-0.0.6.tar.gz.

File metadata

File hashes

Hashes for pyr_benchmark_wrangling-0.0.6.tar.gz
Algorithm Hash digest
SHA256 44c5504cb3f8f0b02fbab96bc41b066b1e79d03fce7bf8305f432740d425e86c
MD5 b8e302b014371f630d5c0082eab06eda
BLAKE2b-256 a3d1762793547d7bb94dd7205267e932563940b87ff2f363aff039bfa5f00201

See more details on using hashes here.

File details

Details for the file pyr_benchmark_wrangling-0.0.6-py3-none-any.whl.

File metadata

File hashes

Hashes for pyr_benchmark_wrangling-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 aa499dacd740d8d9fb61128a1ec664cbdfdfa884c728b233b43728460e7ed0df
MD5 b8f12184c1323bcbf0b37448e482043f
BLAKE2b-256 637e368d46cc17e10adfcc38156142dc0b10f58a1b386df12f9fbd1fdc17d08b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page