Write a Blog >>
ESEC/FSE 2022
Mon 14 - Fri 18 November 2022 Singapore
Tue 15 Nov 2022 15:15 - 15:30 at SRC GLR - JF Perspectives Chair(s): Julia Lawall

The SZZ approach for identifying fix-inducing changes traces backwards from a commit that fixes a defect to those commits that are implicated in the fix. This approach is at the heart of studies of characteristics of fix-inducing changes, as well as the popular Just-in-Time (JIT) variant of defect prediction. However, some types of commits are invisible to the SZZ approach. We refer to these invisible commits as “Ghost Commits.” In this paper, we set out to define, quantify, characterize, and mitigate ghost commits that impact the SZZ algorithm during its mapping (i.e., linking defect-fixing commits to those commits that are implicated by the fix) and filtering phases (i.e., removing improbable fix-inducing commits from the set of implicated commits). We mine the version control repositories of 14 open source Apache projects for instances of mapping-phase and filtering-phase ghost commits. We find that (1) 5.66%–11.72% of defect-fixing commits of defect-fixing commits only add lines, and thus, cannot be mapped back to implicated commits; (2) 1.05%–4.60% of the studied commits only remove lines, and thus, cannot be implicated in future fixes; and (3) that no implicated commits survive the filtering process of 0.35%–14.49% defect-fixing commits. Qualitative analysis of ghost commits reveals that 46.5% of 142 addition-only defect-fixing commits add checks (e.g., null-ness or emptiness checks), while 39.7% of 307 removal-only commits clean up (unused) code. Our results suggest that the next generation of SZZ improvements should be language-aware to connect ghost commits to implicated and defect-fixing commits. Based on our observations, we discuss promising directions for mitigation strategies to address each type of ghost commit. Moreover, we implement mitigation strategies for addition-only commits and evaluate those strategies with respect to a baseline approach. The results indicate that our strategies achieve a precision of 0.753, improving the precision of implicated commits by 39.5 percentage points.

Tue 15 Nov

Displayed time zone: Beijing, Chongqing, Hong Kong, Urumqi change

14:00 - 15:30
JF PerspectivesJournal First at SRC GLR
Chair(s): Julia Lawall Inria
14:00
15m
Talk
On the Relationship Between the Developer’s Perceptible Race and Ethnicity and the Evaluation of Contributions in OSS
Journal First
Reza Nadri University of Waterloo, Gema Rodríguez-Pérez University of British Columbia (UBC), Mei Nagappan University of Waterloo
14:15
15m
Talk
Understanding Software-2.0: A Study of Machine Learning library usage and evolution
Journal First
Malinda Dilhara University of Colorado Boulder, USA, Ameya Ketkar Oregon State University, USA, Danny Dig University of Colorado Boulder, USA
Link to publication DOI Pre-print
14:30
15m
Talk
How Do Android Developers Improve Non-Functional Properties of Software?
Journal First
James Callan UCL, Oliver Krauss University of Applied Sciences Upper Austria, Justyna Petke University College London, Federica Sarro University College London
14:45
15m
Talk
Empowering the Human as the Fitness Function in Search-Based Model-Driven Engineering
Journal First
Francisca Pérez SVIT Research Group. Universidad San Jorge, Jaime Font San Jorge University, Spain, Lorena Arcega San Jorge University, Carlos Cetina San Jorge University, Spain
15:00
15m
Talk
An empirical study of developers’ discussions about security challenges of different programming languages
Journal First
Roland Croft The University of Adelaide, Yongzheng Xie University of Adelaide, Mansooreh Zahedi The Univeristy of Melbourne, Muhammad Ali Babar University of Adelaide, Christoph Treude University of Melbourne
15:15
15m
Talk
The Ghost Commit Problem When Identifying Fix-Inducing Changes: An Empirical Study of Apache Projects
Journal First
Christophe Rezk McGill University, Yasutaka Kamei Kyushu University, Shane McIntosh University of Waterloo
Link to publication DOI Pre-print