RegMiner: Mining Replicable Regression Dataset from Code Repositories
We introduce a tool, RegMiner, to automate the process of collecting replicable regression bugs from a set of user specified Git repositories. In the code commit history, RegMiner searches for regressions where a test can pass a regression-fixing commit, fail a regression-inducing commit, and pass a previous working commit again. Technically, RegMiner (1) identifies potential regression- fixing commits from the code evolution history, (2) migrates the test and its code dependencies over the history, and (3) minimizes the compilation overhead during the regression search. Our experients show that RegMiner can successfully collect 1035 regressions over 147 projects in 8 weeks, creating the largest replicable regression dataset within the shortest period, to the best of our knowledge. In addition, our experiments further show that (1) RegMiner can construct the regression dataset with very high precision and acceptable recall, and (2) the constructed regression dataset is of high authenticity and diversity. The source code of RegMiner is available at https://github.com/SongXueZhi/RegMiner, the mined regression dataset is available at https://regminer.github.io/, and the demonstration video is available at https://youtu.be/yzcM9Y4unok.
Wed 16 NovDisplayed time zone: Beijing, Chongqing, Hong Kong, Urumqi change
11:00 - 12:30 | Mining Software RepositoriesResearch Papers / Demonstrations at SRC Auditorium 2 Chair(s): Timofey Bryksin JetBrains Research | ||
11:00 15mTalk | An Exploratory Study on the Predominant Programming Paradigms in Python Code Research Papers DOI Pre-print Media Attached | ||
11:15 15mTalk | An Empirical Study of Blockchain System Vulnerabilities: Modules, Types, and Patterns Research Papers Xiao Yi Chinese University of Hong Kong, Daoyuan Wu Chinese University of Hong Kong, Lingxiao Jiang Singapore Management University, Yuzhou Fang Chinese University of Hong Kong, Kehuan Zhang Chinese University of Hong Kong, Wei Zhang Nanjing University of Posts and Telecommunications DOI | ||
11:30 15mTalk | How to Better Utilize Code Graphs in Semantic Code Search? Research Papers Yucen Shi Northeastern University, Ying Yin Northeastern University, Zhengkui Wang Singapore Institute of Technology, David Lo Singapore Management University, Tao Zhang Macau University of Science and Technology, Xin Xia Huawei, Yuhai Zhao Northeastern University, Bowen Xu Singapore Management University DOI | ||
11:45 15mTalk | 23 Shades of Self-Admitted Technical Debt: An Empirical Study on Machine Learning Software Research Papers David OBrien Iowa State University, Sumon Biswas Carnegie Mellon University, Sayem Mohammad Imtiaz Iowa State University, Rabe Abdalkareem Carleton University, Emad Shihab Concordia University, Hridesh Rajan Iowa State University DOI | ||
12:00 7mTalk | WikiDoMiner: Wikipedia Domain-specific Miner Demonstrations Saad Ezzini University of Luxembourg, Sallam Abualhaija University of Luxembourg, Mehrdad Sabetzadeh University of Ottawa | ||
12:08 7mTalk | RegMiner: Mining Replicable Regression Dataset from Code Repositories Demonstrations Xuezhi Song Fudan University, Yun Lin Shanghai Jiao Tong University; National University of Singapore, Yijian Wu Fudan University, Yifan Zhang National University of Singapore, Siang Hwee Ng National University of Singapore, Xin Peng Fudan University, Jin Song Dong National University of Singapore, Hong Mei Peking University |