Write a Blog >>
ESEC/FSE 2022
Mon 14 - Fri 18 November 2022 Singapore
Tue 15 Nov 2022 11:30 - 11:45 at SRC Auditorium 2 - Machine Learning II Chair(s): Atif Memon

Machine Learning (ML) has become the cornerstone of information retrieval (IR)
software, as it can drive better user experience by leveraging information-rich
data and complex models.
However, evaluating the emergent behavior of ML-based IR software can be challenging with traditional software testing approaches:
when developers modify the software, they cannot often extract useful information from individual test instances; rather, they seek to holistically verify whether—and where—their modifications caused significant regressions or improvements at scale.
In this paper, we introduce not only such a {\it holistic approach}
to evaluate the system-level behavior of the software, but also the concept of
a {\it defect class}, which represents a partition of the input space on which
the ML-based software does measurably worse for an existing feature or on which
the ML task is more challenging for a new feature. We leverage large volumes
of functional test cases, automatically obtained, to derive these defect
classes, and propose new ways to improve the IR software from an end-user's
perspective. Applying our approach on a real production Search-AutoComplete
system that contains a query interpretation ML component, we demonstrate that (1)
our holistic metrics successfully identified two regressions and one
improvement, where all 3 were independently verified with retrospective A/B
experiments, (2) the automatically obtained defect classes provided actionable
insights during early-stage ML development, and (3) we also detected defect
classes at the finer sub-component level for which there were significant
regressions, which we blocked prior to different releases.

Tue 15 Nov

Displayed time zone: Beijing, Chongqing, Hong Kong, Urumqi change

10:45 - 12:15
10:45
15m
Talk
Understanding Performance Problems in Deep Learning Systems
Research Papers
Junming Cao Fudan University, Bihuan Chen Fudan University, Chao Sun Fudan University, Longjie Hu Fudan University, Shuaihong Wu Fudan University, Xin Peng Fudan University
DOI
11:00
15m
Talk
API Recommendation for Machine Learning Libraries: How Far Are We?
Research Papers
Moshi Wei York University, Yuchao Huang Institute of Software at Chinese Academy of Sciences, Junjie Wang Institute of Software at Chinese Academy of Sciences; University of Chinese Academy of Sciences, Jiho Shin York University, Nima Shiri Harzevili York University, Song Wang York University
DOI Pre-print
11:15
15m
Talk
No More Fine-Tuning? An Experimental Evaluation of Prompt Tuning in Code Intelligence
Research Papers
Chaozheng Wang Harbin Institute of Technology, Yuanhang Yang Harbin Institute of Technology, Cuiyun Gao Harbin Institute of Technology, Yun Peng Chinese University of Hong Kong, Hongyu Zhang University of Newcastle, Michael Lyu Chinese University of Hong Kong
DOI
11:30
15m
Talk
Improving ML-Based Information Retrieval Software with User-Driven Functional Testing and Defect Class Analysis
Industry Paper
Junjie Zhu Apple, Teng Long Apple, Wei Wang Apple, Atif Memon Apple
DOI
11:45
15m
Talk
Discrepancies among Pre-trained Deep Neural Networks: A New Threat to Model Zoo Reliability
Ideas, Visions and Reflections
Diego Montes Purdue University, Pongpatapee Peerapatanapokin Purdue University, Jeff Schultz Purdue University, Chengjun Guo Purdue University, Wenxin Jiang Purdue University, James C. Davis Purdue University
DOI