Lighting Up Supervised Learning in User Review-Based Code Localization: Dataset and Benchmark (ESEC/FSE 2022 - Research Papers)

Mon 14 - Fri 18 November 2022 Singapore

Who

Xinwen Hu, Yu Guo, Jianjie Lu, Zheling Zhu, Chuanyi Li, Jidong Ge, Liguo Huang, Bin Luo

Track

ESEC/FSE 2022 Research Papers

Abstract

As User Reviews (URs) of mobile Apps are proven to provide valuable feedback for maintaining and evolving applications, how to make full use of URs more efficiently in the release cycle of mobile Apps has become a widely concerned and researched topic in the Software Engineering (SE) community. In order to speed up the completion of coding work related to URs to shorten the release cycle as much as possible, the task of User Review-based code localization is proposed and studied in depth. However, due to the lack of large-scale ground truth dataset (i.e., truly related <UR, Code> pairs), existing methods are all unsupervised learning-based. In order to light up supervised learning approaches, which are driven by large labeled datasets, for Review2Code, and to compare their performances with unsupervised learning-based methods, we first introduce a large-scale human-labeled <UR, Code> ground truth dataset, including the annotation process and statistical analysis. Then, a benchmark consisting of two SOTA unsupervised learning-based and four supervised learning-based Review2Code methods is constructed based on this dataset. We believe that this paper can provide a basis for in-depth exploration of the supervised learning-based Review2Code solutions.

DOI

https://doi.org/10.1145/3540250.3549141

Xinwen Hu

Nanjing University

China

Yu Guo

Nanjing University

China

Jianjie Lu

Nanjing University

China

Zheling Zhu

Nanjing University

China

Chuanyi Li

Nanjing University

China

Jidong Ge

Nanjing University

China

Liguo Huang

Southern Methodist University

United States

Bin Luo

Nanjing University

China

Chuanyi Li: Review2Code