Is Neuron Coverage a Meaningful Measure for Testing Deep Neural Networks?
Recent effort to test deep learning systems has produced an intuitive and compelling test criterion called neuron coverage (NC), which resembles the notion of traditional code coverage. NC measures the proportion of neurons activated in a neural network and it is implic- itly assumed that increasing NC improves the quality of a test suite. In an attempt to automatically generate a test suite that increases NC, we design a novel diversity promoting regularizer that can be plugged into existing adversarial attack algorithms. We then assess whether such attempts to increase NC could generate a test suite that (1) detects adversarial attacks successfully, (2) produces natural inputs, and (3) is unbiased to particular class predictions. Contrary to expectation, our extensive evaluation finds that increasing NC actually makes it harder to generate an effective test suite: higher neuron coverage leads to fewer defects detected, less natural inputs, and more biased prediction preferences. Our results invoke skep- ticism that increasing neuron coverage may not be a meaningful objective for generating tests for deep neural networks and call for a new test generation technique that considers defect detection, naturalness, and output impartiality in tandem.
Tue 15 NovDisplayed time zone: Beijing, Chongqing, Hong Kong, Urumqi change
14:00 - 15:30
ESEC/FSE 20 Software Testing IESEC/FSE 2020 at SRC LT 52
Chair(s): Arie van Deursen Delft University of Technology
|Testing Self-Adaptive Software with Probabilistic Guarantees on Performance Metrics|
Claudio Mandrioli Lund University, Sweden, Martina Maggio Saarland University, Germany / Lund University, SwedenDOI Pre-print
|Search-Based Adversarial Testing and Improvement of Constrained Credit Scoring Systems|
Salah Ghamizi University of Luxembourg, Luxembourg, Maxime Cordy University of Luxembourg, Luxembourg, Martin Gubri University of Luxembourg, Luxembourg, Mike Papadakis University of Luxembourg, Luxembourg, Andrey Boystov University of Luxembourg, Luxembourg, Yves Le Traon University of Luxembourg, Luxembourg, Anne Goujon BGL BNP Paribas, Luxembourg
|Is Neuron Coverage a Meaningful Measure for Testing Deep Neural Networks?|
Fabrice Harel-Canada University of California at Los Angeles, USA, Lingxiao Wang University of California at Los Angeles, USA, Muhammad Ali Gulzar Virginia Tech, USA, Quanquan Gu University of California at Los Angeles, USA, Miryung Kim University of California at Los Angeles, USALink to publication Authorizer link Pre-print
|When Does My Program Do This? Learning Circumstances of Software Behavior|
Alexander Kampmann CISPA, Germany, Nikolas Havrikov CISPA, Germany, Ezekiel Soremekun SnT, University of Luxembourg, Andreas Zeller CISPA Helmholtz Center for Information SecurityLink to publication DOI
|FrUITeR: A Framework for Evaluating UI Test Reuse|
Yixue Zhao University of Massachusetts at Amherst, Justin Chen Columbia University, USA, Adriana Sejfia University of Southern California, Marcelo Schmitt Laser University of Southern California, USA, Jie M. Zhang King's College London, Federica Sarro University College London, Mark Harman University College London, Nenad Medvidović University of Southern CaliforniaPre-print Media Attached