23 Shades of Self-Admitted Technical Debt: An Empirical Study on Machine Learning Software (ESEC/FSE 2022 - Research Papers)

Who

David OBrien, Sumon Biswas, Sayem Mohammad Imtiaz, Rabe Abdalkareem, Emad Shihab, Hridesh Rajan

Track

ESEC/FSE 2022 Research Papers

Time Zone

The program is currently displayed in (GMT+08:00) Beijing, Chongqing, Hong Kong, Urumqi.

Use conference time zone: (GMT+08:00) Beijing, Chongqing, Hong Kong, UrumqiSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 16 Nov 2022 11:45 - 12:00 at SRC Auditorium 2 - Mining Software Repositories Chair(s): Timofey Bryksin

Abstract

In software development, the term “technical debt” (TD) is used to characterize short-term solutions and workarounds implemented in source code which may incur a long-term cost. Technical debt has a variety of forms and can thus affect multiple qualities of software including but not limited to its legibility, performance, and structure. In this paper, we have conducted a comprehensive study on the technical debts in machine learning (ML) based software. TD can appear differently in ML software by infecting the data that ML models are trained on, thus affecting the functional behavior of ML systems. The growing inclusion of ML components in modern software systems have introduced a new set of TDs. Does ML software have similar TDs to traditional software? If not, what are the new types of ML specific TDs? Which ML pipeline stages do these debts appear? Do these debts differ in ML tools and applications and when they get removed? Currently, we do not know the state of the ML TDs in the wild. To address these questions, we mined 68,820 self-admitted technical debts (SATD) from all the revisions of a curated dataset consisting of 2,641 popular ML repositories from GitHub, along with their introduction and removal. By applying an open-coding scheme and following upon prior works, we provide a comprehensive taxonomy of ML SATDs. Our study analyzes ML SATD type organizations, their frequencies within stages of ML software, the differences between ML SATDs in applications and tools, and quantifies the removal of ML SATDs. The findings discovered suggest implications for ML developers and researchers to create maintainable ML systems.

DOI

https://doi.org/10.1145/3540250.3549088

David OBrien

Iowa State University

United States

Sumon Biswas

Carnegie Mellon University

United States

Sayem Mohammad Imtiaz

Iowa State University

United States

Rabe Abdalkareem

Carleton University

Canada

Emad Shihab

Concordia University

Canada

Hridesh Rajan

Iowa State University

United States

Time Zone

The program is currently displayed in (GMT+08:00) Beijing, Chongqing, Hong Kong, Urumqi.

Use conference time zone: (GMT+08:00) Beijing, Chongqing, Hong Kong, UrumqiSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 16 Nov
Displayed time zone: Beijing, Chongqing, Hong Kong, Urumqi change

11:00 - 12:30	Mining Software RepositoriesResearch Papers / Demonstrations at SRC Auditorium 2 Chair(s): Timofey Bryksin JetBrains Research

11:00 15m Talk		An Exploratory Study on the Predominant Programming Paradigms in Python Code Research Papers Robert Dyer University of Nebraska-Lincoln, Jigyasa Chauhan University of Nebraska-Lincoln DOI Pre-print Media Attached
11:15 15m Talk		An Empirical Study of Blockchain System Vulnerabilities: Modules, Types, and Patterns Research Papers Xiao Yi Chinese University of Hong Kong, Daoyuan Wu Chinese University of Hong Kong, Lingxiao Jiang Singapore Management University, Yuzhou Fang Chinese University of Hong Kong, Kehuan Zhang Chinese University of Hong Kong, Wei Zhang Nanjing University of Posts and Telecommunications DOI
11:30 15m Talk		How to Better Utilize Code Graphs in Semantic Code Search? Research Papers Yucen Shi Northeastern University, Ying Yin Northeastern University, Zhengkui Wang Singapore Institute of Technology, David Lo Singapore Management University, Tao Zhang Macau University of Science and Technology, Xin Xia Huawei, Yuhai Zhao Northeastern University, Bowen Xu Singapore Management University DOI
11:45 15m Talk		23 Shades of Self-Admitted Technical Debt: An Empirical Study on Machine Learning Software Research Papers David OBrien Iowa State University, Sumon Biswas Carnegie Mellon University, Sayem Mohammad Imtiaz Iowa State University, Rabe Abdalkareem Carleton University, Emad Shihab Concordia University, Hridesh Rajan Iowa State University DOI
12:00 7m Talk		WikiDoMiner: Wikipedia Domain-specific Miner Demonstrations Saad Ezzini University of Luxembourg, Sallam Abualhaija University of Luxembourg, Mehrdad Sabetzadeh University of Ottawa
12:08 7m Talk		RegMiner: Mining Replicable Regression Dataset from Code Repositories Demonstrations Xuezhi Song Fudan University, Yun Lin Shanghai Jiao Tong University; National University of Singapore, Yijian Wu Fudan University, Yifan Zhang National University of Singapore, Siang Hwee Ng National University of Singapore, Xin Peng Fudan University, Jin Song Dong National University of Singapore, Hong Mei Peking University