Write a Blog >>
ESEC/FSE 2022
Mon 14 - Fri 18 November 2022 Singapore

To improve software quality, just-in-time defect prediction (JIT-DP) (identifying defect-inducing commits) and just-in-time defect localization (JIT-DL) (identifying defect-inducing code lines in commits) have been widely studied by learning semantic features or expert features, respectively, and indeed achieved promising performance. Semantic features represent the intrinsic characteristics of defect-inducing commits, while expert features are experts’ knowledge on understanding such commits. Unfortunately, the best of the two features have not been fully explored together to boost the just-in-time defect prediction and localization in the literature yet. JIT-DP identifies defects at the coarse commit level, while as the consequent task of JIT-DP, JIT-DL cannot achieve the accurate localization of defect-inducing code lines in a commit without JIT-DP. We hypothesize that the two just-in-time tasks can be combined together to boost the accurate prediction and localization of defect-inducing commits by integrating semantic features with expert features. Therefore, we propose to build a unified model of just-in-time defect prediction and localization, JIT-Fine, by leveraging the best of semantic features and expert features. To assess the feasibility of JIT-Fine, we first build a large-scale line-level manually labeled dataset JIT-Defects4J. Then, we make a comprehensive comparison with six state-of-the-art baselines under various settings using ten performance measures grouped into two types: effort-agnostic and effort-aware. The experimental results indicate that JIT-Fine can outperform all state-of-the-art baselines on both JIT-DP and JIT-DL tasks in terms of ten performance measures with a substantial improvement (i.e., 10%-629% in terms of effort-agnostic measures on JIT-DP, 5%-54% in terms of effort-aware measures on JIT-DP, and 4%-117% in terms of effort-aware measures on JIT-DL). The good results of JIT-Fine also indicate the advantages of combining expert features and semantic features, as well as building a unified model for both JIT-DP and JIT-DL.