On the Effectiveness of Adversarial Training in Defending against Adversarial Example Attacks for Image Classification

State-of-the-art neural network models are actively used in various fields, but it is well-known that they are vulnerable to adversarial example attacks. Throughout the efforts to make the models robust against adversarial example attacks, it has been found to be a very difficult task. While many de...

Full description

Bibliographic Details
Main Authors:	Sanglee Park, Jungmin So
Format:	Article
Language:	English
Published:	MDPI AG 2020-11-01
Series:	Applied Sciences
Subjects:	image classification adversarial example attacks adversarial training MNIST
Online Access:	https://www.mdpi.com/2076-3417/10/22/8079

id	doaj-1017460527d34ac89c1bfa30d6062c91
record_format	Article
spelling	doaj-1017460527d34ac89c1bfa30d6062c912020-11-25T04:07:56ZengMDPI AGApplied Sciences2076-34172020-11-01108079807910.3390/app10228079On the Effectiveness of Adversarial Training in Defending against Adversarial Example Attacks for Image ClassificationSanglee Park0Jungmin So1Department of Computer Science and Engineering, Sogang University, Seoul 04107, KoreaDepartment of Computer Science and Engineering, Sogang University, Seoul 04107, KoreaState-of-the-art neural network models are actively used in various fields, but it is well-known that they are vulnerable to adversarial example attacks. Throughout the efforts to make the models robust against adversarial example attacks, it has been found to be a very difficult task. While many defense approaches were shown to be not effective, adversarial training remains as one of the promising methods. In adversarial training, the training data are augmented by “adversarial” samples generated using an attack algorithm. If the attacker uses a similar attack algorithm to generate adversarial examples, the adversarially trained network can be quite robust to the attack. However, there are numerous ways of creating adversarial examples, and the defender does not know what algorithm the attacker may use. A natural question is: Can we use adversarial training to train a model robust to multiple types of attack? Previous work have shown that, when a network is trained with adversarial examples generated from multiple attack methods, the network is still vulnerable to white-box attacks where the attacker has complete access to the model parameters. In this paper, we study this question in the context of black-box attacks, which can be a more realistic assumption for practical applications. Experiments with the MNIST dataset show that adversarially training a network with an attack method helps defending against that particular attack method, but has limited effect for other attack methods. In addition, even if the defender trains a network with multiple types of adversarial examples and the attacker attacks with one of the methods, the network could lose accuracy to the attack if the attacker uses a different data augmentation strategy on the target network. These results show that it is very difficult to make a robust network using adversarial training, even for black-box settings where the attacker has restricted information on the target network.https://www.mdpi.com/2076-3417/10/22/8079image classificationadversarial example attacksadversarial trainingMNIST
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Sanglee Park Jungmin So
spellingShingle	Sanglee Park Jungmin So On the Effectiveness of Adversarial Training in Defending against Adversarial Example Attacks for Image Classification Applied Sciences image classification adversarial example attacks adversarial training MNIST
author_facet	Sanglee Park Jungmin So
author_sort	Sanglee Park
title	On the Effectiveness of Adversarial Training in Defending against Adversarial Example Attacks for Image Classification
title_short	On the Effectiveness of Adversarial Training in Defending against Adversarial Example Attacks for Image Classification
title_full	On the Effectiveness of Adversarial Training in Defending against Adversarial Example Attacks for Image Classification
title_fullStr	On the Effectiveness of Adversarial Training in Defending against Adversarial Example Attacks for Image Classification
title_full_unstemmed	On the Effectiveness of Adversarial Training in Defending against Adversarial Example Attacks for Image Classification
title_sort	on the effectiveness of adversarial training in defending against adversarial example attacks for image classification
publisher	MDPI AG
series	Applied Sciences
issn	2076-3417
publishDate	2020-11-01
description	State-of-the-art neural network models are actively used in various fields, but it is well-known that they are vulnerable to adversarial example attacks. Throughout the efforts to make the models robust against adversarial example attacks, it has been found to be a very difficult task. While many defense approaches were shown to be not effective, adversarial training remains as one of the promising methods. In adversarial training, the training data are augmented by “adversarial” samples generated using an attack algorithm. If the attacker uses a similar attack algorithm to generate adversarial examples, the adversarially trained network can be quite robust to the attack. However, there are numerous ways of creating adversarial examples, and the defender does not know what algorithm the attacker may use. A natural question is: Can we use adversarial training to train a model robust to multiple types of attack? Previous work have shown that, when a network is trained with adversarial examples generated from multiple attack methods, the network is still vulnerable to white-box attacks where the attacker has complete access to the model parameters. In this paper, we study this question in the context of black-box attacks, which can be a more realistic assumption for practical applications. Experiments with the MNIST dataset show that adversarially training a network with an attack method helps defending against that particular attack method, but has limited effect for other attack methods. In addition, even if the defender trains a network with multiple types of adversarial examples and the attacker attacks with one of the methods, the network could lose accuracy to the attack if the attacker uses a different data augmentation strategy on the target network. These results show that it is very difficult to make a robust network using adversarial training, even for black-box settings where the attacker has restricted information on the target network.
topic	image classification adversarial example attacks adversarial training MNIST
url	https://www.mdpi.com/2076-3417/10/22/8079
work_keys_str_mv	AT sangleepark ontheeffectivenessofadversarialtrainingindefendingagainstadversarialexampleattacksforimageclassification AT jungminso ontheeffectivenessofadversarialtrainingindefendingagainstadversarialexampleattacksforimageclassification
_version_	1724427332703223808

On the Effectiveness of Adversarial Training in Defending against Adversarial Example Attacks for Image Classification

Similar Items