Exploring the Under-Explored Terrain of Non-open Source Data for Software Engineering through the Lens of Federated Learning
The availability of open source projects on platforms like GitHub has led to the wide use of the artifacts from these projects in software engineering research. These publicly available artifacts have been used to train artificial intelligence models used in various empirical studies and the development of tools. However, these advancements have missed out on the artifacts from non-open source projects due to the unavailability of the data. A major cause for the unavailability of the data from non-open source repositories is the issue concerning data privacy. In this paper, we propose using federated learning to address the issue of data privacy to enable the use of data from non-open source to train AI models used in software engineering research. We believe that this can potentially enable industries to collaborate with software engineering researchers without concerns about privacy. We present the preliminary evaluation of the use of federated learning to train a classifier to label bug-fix commits from an existing study to demonstrate its feasibility. The federated approach achieved an F1 score of 0.83 compared to a score of 0.84 using the centralized approach. We also present our vision of the potential implications of the use of federated learning in software engineering research.
Tue 15 NovDisplayed time zone: Beijing, Chongqing, Hong Kong, Urumqi change
14:00 - 15:30 | Machine Learning IIIResearch Papers / Ideas, Visions and Reflections at SRC Auditorium 2 Chair(s): Xi Zheng Macquarie University | ||
14:00 15mTalk | AutoPruner: Transformer-Based Call Graph Pruning Research Papers Le-Cong Thanh Singapore Management University, Hong Jin Kang Singapore Management University, Truong Giang Nguyen Singapore Management University, Stefanus Agus Haryono Singapore Management University, David Lo Singapore Management University, Xuan-Bach D. Le University of Melbourne, Quyet Thang Huynh Hanoi University of Science and Technology DOI Pre-print | ||
14:15 15mTalk | Exploring the Under-Explored Terrain of Non-open Source Data for Software Engineering through the Lens of Federated Learning Ideas, Visions and Reflections DOI Pre-print | ||
14:30 15mTalk | CORMS: A GitHub and Gerrit Based Hybrid Code Reviewer Recommendation Approach for Modern Code Review Research Papers DOI | ||
14:45 15mFull-paper | Hierarchical Bayesian Multi-kernel Learning for Integrated Classification and Summarization of App Reviews Research Papers Moayad Alshangiti University of Jeddah; Rochester Institute of Technology, Weishi Shi Rochester Institute of Technology, Eduardo Coelho de Lima Rochester Institute of Technology, Xumin Liu Rochester Institute of Technology, Qi Yu Rochester Institute of Technology DOI |