Lighting Up Supervised Learning in User Review-Based Code Localization: Dataset and Benchmark
As User Reviews (URs) of mobile Apps are proven to provide valuable feedback for maintaining and evolving applications, how to make full use of URs more efficiently in the release cycle of mobile Apps has become a widely concerned and researched topic in the Software Engineering (SE) community. In order to speed up the completion of coding work related to URs to shorten the release cycle as much as possible, the task of User Review-based code localization is proposed and studied in depth. However, due to the lack of large-scale ground truth dataset (i.e., truly related <UR, Code> pairs), existing methods are all unsupervised learning-based. In order to light up supervised learning approaches, which are driven by large labeled datasets, for Review2Code, and to compare their performances with unsupervised learning-based methods, we first introduce a large-scale human-labeled <UR, Code> ground truth dataset, including the annotation process and statistical analysis. Then, a benchmark consisting of two SOTA unsupervised learning-based and four supervised learning-based Review2Code methods is constructed based on this dataset. We believe that this paper can provide a basis for in-depth exploration of the supervised learning-based Review2Code solutions.