Execution-Feature-Driven Debugging
Project description
Execution-Feature-Driven Debugging
Abstract
Fault localization is a fundamental aspect of debugging, aiming to identify code regions likely responsible for failures. Traditional techniques primarily correlate statement execution with failures, yet program behavior is influenced by diverse execution features—such as variable values, branch conditions, and definition-use pairs—that can provide richer diagnostic insights.
In an empirical study of 310 bugs across 20 projects, we analyzed 17 execution features and assessed their correlation with failure outcomes. Our findings suggest that fault localization benefits from a broader range of execution features: (1) Scalar pairs exhibit the strongest correlation with failures; (2) Beyond line executions, def-use pairs and functions executed are key indicators for fault localization; and (3) Combining multiple features enhances effectiveness compared to relying solely on individual features.
Building on these insights, we introduce a debugging approach to diagnose failure circumstances. The approach extracts fine-grained execution features and trains a decision tree to differentiate passing and failing runs. From this model, we derive a diagnosis that pinpoints faulty locations and explains the underlying causes of the failure.
Our evaluation demonstrates that the generated diagnoses achieve high predictive accuracy, reinforcing their reliability. These interpretable diagnoses empower developers to efficiently debug software by providing deeper insights into failure causes.
Study
Setup
We leverage SFLKit to collect the event data for the subjects. SFLKit is a tool that instruments the subject programs to collect the event data. The event data is a sequence of events that occur during the execution of the subject.
As subjects of our empirical study, we leverage Tests4Py.
The study is located in the study directory.
Additionally, we have implemented a script, study.py, to run the experiments and analyze the results.
Installing Requirements
To install the requirements, run the following command inside the study directory:
python -m pip install -r requirements.txt
We recommend using a virtual environment to install the requirements. To create a virtual environment, run the following command:
python -m venv .venv
and to activate the virtual environment, run the following command:
. .venv/bin/activate
or
source .venv/bin/activate
Getting the Data Set
To get the data set, please download the data set from here and extract it to
the study directory.
You can also reproduce the data by following the next section.
Reproducing the Data Set
Collecting The Event Data
To collect the event data, run the following command:
python study.py event -p <project_name> [-i <bug_id>]
For instance, to collect the event data for bug 1 of the project black, run the following command:
python3 get_events.py -p black -i 1
The collected event data will be stored in the sflkit_events directory.
Additionally, this script maps all possible events for the subjects and stores them in
mappings/<project_name>_<bug_id>.json.
So the collected events and mapping of the black project and bug 1 will be stored in sflkit_events/black/1/bug for
the buggy version, sflkit_events/black/1/fix for the fixed version, and mappings/black_1.json for the mapping.
Remove the report_<project_name>.json file if you want to collect the event data from scratch.
Evaluating the Correlation and Fault Localization
To evaluate the correlation and fault localization, run the following command:
python study.py evaluate -p <project_name> [-i <bug_id>]
This script will evaluate the correlation of the execution features with the failure and the fault localization.
This script generates the features and their values in the analysis directory as an intermediate step.
The following command can explicitly run this step:
python study.py analyze -p <project_name> [-i <bug_id>]
The results of the correlation and fault localization will be stored in the results directory for each subject
individually as a JSON file with the name <project_name>_<bug_id>.json.
If you want to evaluate the correlation and fault localization from scratch, you need to remove the corresponding
files in the results directory.
To summarize the results of all subjects, run the following command:
python study.py summarize
The summarized results will be stored in a file called summary.json.
Execution-Feature-Driven Debugging
Installation
To install __E__xecution-__F__eature-__D__riven __D__ebugging (EFDD), run the following command:
python -m pip install .
Usage
For EFDD, you need to instrument your subject.
from efdd.events import instrument
instrument("middle.py", "tmp.py", "mapping.json")
Next, you need some tests to execute and collect their event traces.
We provide two collectors, one for unit tests and one for input to the program.
However, implementing another collector by inheriting the base class EventCollector and implementing its collect() method is an option.
To employ the collector, use it like this:
from efdd.events import SystemtestEventCollector
collector = SystemtestEventCollector(os.getcwd(), "middle.py", "tmp.py", mapping_path="mapping.json")
events = collector.get_events((passing, failing))
In this example, we leverage the input event collector.
passing and failing are lists of passing and failing inputs.
Next, you can utilize the event handler to extract and build feature vectors from the event traces.
from sflkit.features.handler import EventHandler
handler = EventHandler()
handler.handle_files(events)
Now, we can leverage EFDD learning to infer a failure diagnosis.
from efdd.learning import DecisionTreeDiagnosis
debugger = DecisionTreeDiagnosis()
debugger.fit(
handler.builder.get_all_features(),
handler,
)
Now, we can leverage the underlying model of the debugger as a diagnosis that pinpoints faulty locations and explains the underlying causes of the failure.
We provide an example of this walk-through in evaluation/example.ipynb.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file efdd-0.0.3.tar.gz.
File metadata
- Download URL: efdd-0.0.3.tar.gz
- Upload date:
- Size: 10.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9379b1444ce90ab8d13b8c506b215adf9384d6c0ab791551b709198fae50af66
|
|
| MD5 |
123c888a7942a0ea1b13a05c80bb21d6
|
|
| BLAKE2b-256 |
7e03f85e1d81dcdc6bd4f2cb92487cc66b119a1d9f003e1e86c714ea3793e276
|
Provenance
The following attestation bundles were made for efdd-0.0.3.tar.gz:
Publisher:
release.yml on smythi93/efdd
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
efdd-0.0.3.tar.gz -
Subject digest:
9379b1444ce90ab8d13b8c506b215adf9384d6c0ab791551b709198fae50af66 - Sigstore transparency entry: 709520005
- Sigstore integration time:
-
Permalink:
smythi93/efdd@c8086bce5895da23806d33c369282addc56ba7a9 -
Branch / Tag:
refs/heads/release - Owner: https://github.com/smythi93
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@c8086bce5895da23806d33c369282addc56ba7a9 -
Trigger Event:
push
-
Statement type:
File details
Details for the file efdd-0.0.3-py3-none-any.whl.
File metadata
- Download URL: efdd-0.0.3-py3-none-any.whl
- Upload date:
- Size: 8.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3febfb4646bcc099d0ba60bdbe64fb752c3cf60f124f591749b971a5b5139cec
|
|
| MD5 |
06630fb2a3833a543586949f06dbdb28
|
|
| BLAKE2b-256 |
a7819747ce779d3f31fa83ba2513b4f6ceec22336a8f08e8ca94b260d5bf1e8b
|
Provenance
The following attestation bundles were made for efdd-0.0.3-py3-none-any.whl:
Publisher:
release.yml on smythi93/efdd
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
efdd-0.0.3-py3-none-any.whl -
Subject digest:
3febfb4646bcc099d0ba60bdbe64fb752c3cf60f124f591749b971a5b5139cec - Sigstore transparency entry: 709520027
- Sigstore integration time:
-
Permalink:
smythi93/efdd@c8086bce5895da23806d33c369282addc56ba7a9 -
Branch / Tag:
refs/heads/release - Owner: https://github.com/smythi93
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@c8086bce5895da23806d33c369282addc56ba7a9 -
Trigger Event:
push
-
Statement type: