Pair Programming Conversations with Agents vs. Developers: Challenges and Opportunities for SE Community
Recent research has shown feasibility of an interactive pair-programming conversational agent, but implementing such an agent poses three challenges: a lack of benchmark datasets, absence of software engineering specific labels, and the need to understand developer conversations. To address these challenges, we conducted a Wizard of Oz study with 14 participants pair programming with a simulated agent and collected 4,443 developer-agent utterances. Based on this dataset, we created 26 software engineering labels using an open coding process to develop a hierarchical classification scheme. To understand labeled developer-agent conversations, we compared the accuracy of three state-of-the-art transformer-based language models, BERT, GPT-2, and XLNet, which performed interchangeably. In order to begin creating a developer-agent dataset, researchers and practitioners need to conduct resource intensive Wizard of Oz studies. Presently, there exists vast amounts of developer-developer conversations on video hosting websites. To investigate the feasibility of using developer-developer conversations, we labeled a publicly available developer-developer dataset (3,436 utterances) with our hierarchical classification scheme and found that a BERT model trained on developer-developer data performed \textasciitilde10% worse than the BERT trained on developer-agent data, but when using transfer-learning, accuracy improved. Finally, our qualitative analysis revealed that developer-developer conversations are more implicit, neutral, and opinionated than developer-agent conversations. Our results have implications for software engineering researchers and practitioners developing conversational agents.
Mon 14 NovDisplayed time zone: Beijing, Chongqing, Hong Kong, Urumqi change
16:00 - 17:30 | Human/Computer InteractionResearch Papers / Demonstrations at SRC LT 51 Chair(s): Saikat Chakraborty Microsoft Research | ||
16:00 15mTalk | How to Formulate Specific How-To Questions in Software Development? Research Papers Mingwei Liu Fudan University, Xin Peng Fudan University, Andrian Marcus University of Texas at Dallas, Christoph Treude University of Melbourne, Jiazhan Xie Fudan University, Huanjun Xu Fudan University, Yanjun Yang Fudan University DOI | ||
16:15 15mTalk | Asynchronous Technical Interviews: Reducing the Effect of Supervised Think-Aloud on Communication AbilityDistinguished Paper Award Research Papers DOI | ||
16:30 15mTalk | Pair Programming Conversations with Agents vs. Developers: Challenges and Opportunities for SE Community Research Papers Peter Robe University of Tulsa, Sandeep Kuttal University of Tulsa, Jake AuBuchon University of Tulsa, Jacob Hart University of Tulsa DOI | ||
16:45 15mTalk | Toward Interactive Bug Reporting for (Android App) End-Users Research Papers Yang Song College of William and Mary, Junayed Mahmud George Mason University, Ying Zhou University of Texas at Dallas, Oscar Chaparro College of William and Mary, Kevin Moran George Mason University, Andrian Marcus University of Texas at Dallas, Denys Poshyvanyk College of William and Mary DOI | ||
17:00 7mTalk | MultIPAs : Applying Program Transformations to Introductory Programming Assignments for Data Augmentation Demonstrations Pedro Orvalho INESC-ID, Instituto Superior Técnico, University of Lisbon, Mikoláš Janota Czech Technical University in Prague, Vasco Manquinho INESC-ID; Universidade de Lisboa Pre-print | ||
17:08 7mTalk | PolyFax: A Toolkit for Characterizing Multi-Language Software Demonstrations Wen Li Washington State University, Li Li Monash University, Haipeng Cai Washington State University Pre-print |