On the Effectiveness of Data Balancing Techniques in the Context of ML-based Test Case Prioritization (PROMISE'22)

Write a Blog >>

Mon 14 - Fri 18 November 2022 Singapore

Who

Jediael Mendoza, Jason Mycroft, Lyam Milbury, Nafiseh Kahani, Jason Jaskolka

Track

PROMISE'22

Time Zone

The program is currently displayed in (GMT+08:00) Beijing, Chongqing, Hong Kong, Urumqi.

Use conference time zone: (GMT+08:00) Beijing, Chongqing, Hong Kong, UrumqiSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 18 Nov 2022 11:00 - 11:20 at Town Plaza GLR - Software Quality

Abstract

Regression testing is the cornerstone of quality assurance of software systems. However, executing regression test cases can impose significant computational and operational costs. In this context, Machine Learning-based Test Case Prioritization (ML-based TCP) techniques rank the execution of regression tests based on their probability of failures and execution time so that the faults can be detected as early as possible during the regression testing. Despite the recent progress of ML-based TCP, even the best reported ML-based TCP techniques can reach 90% or higher effectiveness in terms of Cost-cognizant Average Percentage of Faults Detected (APFDC) only in 20% of studied subjects. We argue that the imbalanced nature of used training datasets caused by the low failure rate of regression tests is one of the main reasons for this shortcoming. This work conducts an empirical study on applying 19 state-of-the-art data balancing techniques for dealing with imbalanced data sets in the TCP context, based on the most comprehensive publicly available datasets. The results demonstrate that data balancing techniques can improve the effectiveness of the best-known ML-based TCP technique for most subjects, with an average of 0.06 in terms of APFDC.

Jediael Mendoza

Jason Mycroft

Lyam Milbury

Nafiseh Kahani

University of Carlton

Jason Jaskolka