Skip to main content

A tool that builds executable code datasets with GitHub Actions.

Project description

GitBug-Actions

GitBug-Actions is a tool that builds bug-fix benchmarks by leveraging GitHub Actions. The tool mines GitHub repositories and navigates through their commits, locally executing GitHub Actions using act in each commit considered. Finally, the tool checks if a bug-fix pattern was found by looking at the test results parsed from the GitHub Actions runs. If a bug-fix is found, GitBug-Actions is able to export a Docker image with the reproducible environment for the bug-fix. The reproducible environment will preserve all the dependencies required to run the tests for the bug-fix, avoiding the degradation of the benchmark due to dependencies that become unavailable.

If you use GitBug-Actions, please cite:

GitBug-Actions: Building Reproducible Bug-Fix Benchmarks with GitHub Actions (doi:10.1145/3639478.3640023)

@inproceedings{gitbugactions,
 title = {GitBug-Actions: Building Reproducible Bug-Fix Benchmarks with GitHub Actions},
 year = {2024},
 doi = {10.1145/3639478.3640023},
 author = {Saavedra, Nuno and Silva, Andr{\'e} and Monperrus, Martin},
 booktitle = {Proceedings of the ACM/IEEE 46th International Conference on Software Engineering: Companion Proceedings},
}

Requirements

Act

It is required to have act installed and functional. At the moment, GitBug-Actions only works correctly with the modified version of act available here. Other versions will work but some issues may arise.

To install this version:

git clone https://github.com/gitbugactions/act
cd act
make build

A binary file dist/local/act will be created. This binary file should be made available in the $PATH of the system:

export PATH="<REPLACE_WITH_PATH_TO_ACT>:$PATH"

Python dependencies

GitBug-Actions runs on Python3.12 and above.

Ensure Poetry is installed.

Then, to install the Python dependencies run:

poetry shell
poetry install

How to run

Ensure the commands are executed inside the Poetry shell:

poetry shell

Set the environment variable GITHUB_ACCESS_TOKEN with your GitHub access token. The token is used to perform calls to GitHub's API.

export GITHUB_ACCESS_TOKEN="<YOUR_ACCESS_TOKEN>"

Use the --help command to obtain the list of options required to run each script.

python collect_repos.py --help
python collect_bugs.py --help
python export_bugs.py --help
python filter_bugs.py --help

Overview of GitBug-Actions

The figure above provides an overview of the pipeline of GitBug-Actions.

The scripts above should be executed in the same order shown on the figure. collect_bugs will use the repositories found by collect_repos as input. export_bugs uses the bug-fixes found by collect_bugs as input. Finally, filter_bugs uses the bug-fixes found by collect_bugs and the containers exported by export_bugs as input. The output of filter_bugs is a file with a list of non-flaky bug-fixes able to be reproduced in the exported containers.

Tests

To run the tests:

pytest test -s

Practical Challenges

While developing GitBug-Actions, we found some challenges of running CI builds at a large scale. Here we enumerate these challenges and explain how we mitigate them and, in cases that was not possible, how the user should handle them.

Handling Commits without GitHub Actions

One challenge in collecting bug-fix commit pairs by reproducing GitHub Actions is that GitHub Actions were only released in late 2019. Moreover, albeit being the most popular as of 2023, its adoption was not immediate. As a result the majority of commits found on GitHub do not have any associated workflows.

To increase the number of supported commits by GitBug-Actions, it identifies the oldest locally reproducible GitHub Action for each project. Then, for commits not associated with GitHub Actions, GitBug-Actions uses these as an approximation of the intended configuration.

Disk Space Management

Build execution has the potential to exhaust available disk space. To mitigate this, we restrict each build's allocation to a maximum of 3GiB. This restriction is handled by our version of act.

However, users are advised to check disk usage frequently and remove dangling docker containers/images in case they occur. Additionally, users should take special attention to docker volumes which are not automatically removed by act, and can accumulate over time.

Example of how to remove dangling containers and volumes created by act:

# Remove containers
docker rm $(docker stop $(docker ps -a -q --filter ancestor=gitbugactions:latest --format="{{.ID}}"))
# Remove volumes
docker volume ls -q | grep '^act-' | xargs docker volume rm

Concurrent File Access

CI builds may initiate concurrent file access operations, a situation that can escalate to the point of surpassing the user-level open-file limit set by Linux. This is exarcebated when running multiple builds in parallel. To overcome this, we recommend setting the open-file limit for your user profile to a higher threshold than the default.

To check the current limit for your user run ulimit -Sn.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gitbugactions-4.4.5.tar.gz (65.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gitbugactions-4.4.5-py3-none-any.whl (89.0 kB view details)

Uploaded Python 3

File details

Details for the file gitbugactions-4.4.5.tar.gz.

File metadata

  • Download URL: gitbugactions-4.4.5.tar.gz
  • Upload date:
  • Size: 65.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.2 CPython/3.13.2 Linux/6.14.2-arch1-1

File hashes

Hashes for gitbugactions-4.4.5.tar.gz
Algorithm Hash digest
SHA256 673d639d091a2667ad18c7fc6908a26736225454a95d13ccfe188c5705d55545
MD5 c7a7d67fcf280274734b1117172dd18e
BLAKE2b-256 8fa919fa2485aca0b4f5aeedeb092adb7e0e58420b72152957036d55ad84620f

See more details on using hashes here.

File details

Details for the file gitbugactions-4.4.5-py3-none-any.whl.

File metadata

  • Download URL: gitbugactions-4.4.5-py3-none-any.whl
  • Upload date:
  • Size: 89.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.2 CPython/3.13.2 Linux/6.14.2-arch1-1

File hashes

Hashes for gitbugactions-4.4.5-py3-none-any.whl
Algorithm Hash digest
SHA256 e32bfe60bf93cea5013cb4b612cbe0d21c4f94e134692149e8be09d9ef8e8690
MD5 10bbb16ddbc56fa8710c39a15229d88d
BLAKE2b-256 5540962342b4f2fa9aacb9621e475a162a7f97a8db8145198e0a6b262f63fca5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page