Summary: | With recent breakthroughs in Deep Learning (DL), DL systems are increasingly deployed in safety-critical fields. Hence, some software testing methods are required to ensure the reliability and safety of DL systems. Since the rules of DL systems are inferred from training data, it is difficult to know the implementation rules about each behavior of DL systems. At the same time, Random Testing (RT) is a popular testing method and the knowledge about software implementation is not needed when we use RT. Therefore, RT is very suitable for the testing of DL systems. And the existing mechanisms for testing DL systems also depend heavily on RT by the labeled test data. In order to increase the effectiveness of RT for DL systems, we design, implement and evaluate the Adaptive Random Testing for DL systems (ARTDL), which is the first Adaptive Random Testing (ART) method to improve the effectiveness of RT for DL systems. ARTDL refers to the idea of ART. That is, fewer test cases are needed to detect failures by selecting the test case with the furthest distance from non-failure-causing test cases. Firstly, we propose the Feature-based Euclidean Distance (FED) as the distance metric that can be used to measure the difference between failure-causing inputs and non-failure-causing inputs. Secondly, we verify the availability of FED by presenting the failure pattern of DL models. Finally, we design ARTDL algorithm to generate the test cases that are more likely to cause failures based on the FED. We implement ARTDL to test top performing DL models in the field of image classification and automatic driving. The results show that, on average, the number of test cases used to find the first bug is reduced by 62.74% through ARTDL, compared with RT.
|