On the Effectiveness of Adversarial Training in Defending against Adversarial Example Attacks for Image Classification

State-of-the-art neural network models are actively used in various fields, but it is well-known that they are vulnerable to adversarial example attacks. Throughout the efforts to make the models robust against adversarial example attacks, it has been found to be a very difficult task. While many de...

Full description

Bibliographic Details
Main Authors: Sanglee Park, Jungmin So
Format: Article
Language:English
Published: MDPI AG 2020-11-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/10/22/8079
id doaj-1017460527d34ac89c1bfa30d6062c91
record_format Article
spelling doaj-1017460527d34ac89c1bfa30d6062c912020-11-25T04:07:56ZengMDPI AGApplied Sciences2076-34172020-11-01108079807910.3390/app10228079On the Effectiveness of Adversarial Training in Defending against Adversarial Example Attacks for Image ClassificationSanglee Park0Jungmin So1Department of Computer Science and Engineering, Sogang University, Seoul 04107, KoreaDepartment of Computer Science and Engineering, Sogang University, Seoul 04107, KoreaState-of-the-art neural network models are actively used in various fields, but it is well-known that they are vulnerable to adversarial example attacks. Throughout the efforts to make the models robust against adversarial example attacks, it has been found to be a very difficult task. While many defense approaches were shown to be not effective, adversarial training remains as one of the promising methods. In adversarial training, the training data are augmented by “adversarial” samples generated using an attack algorithm. If the attacker uses a similar attack algorithm to generate adversarial examples, the adversarially trained network can be quite robust to the attack. However, there are numerous ways of creating adversarial examples, and the defender does not know what algorithm the attacker may use. A natural question is: Can we use adversarial training to train a model robust to multiple types of attack? Previous work have shown that, when a network is trained with adversarial examples generated from multiple attack methods, the network is still vulnerable to white-box attacks where the attacker has complete access to the model parameters. In this paper, we study this question in the context of black-box attacks, which can be a more realistic assumption for practical applications. Experiments with the MNIST dataset show that adversarially training a network with an attack method helps defending against that particular attack method, but has limited effect for other attack methods. In addition, even if the defender trains a network with multiple types of adversarial examples and the attacker attacks with one of the methods, the network could lose accuracy to the attack if the attacker uses a different data augmentation strategy on the target network. These results show that it is very difficult to make a robust network using adversarial training, even for black-box settings where the attacker has restricted information on the target network.https://www.mdpi.com/2076-3417/10/22/8079image classificationadversarial example attacksadversarial trainingMNIST
collection DOAJ
language English
format Article
sources DOAJ
author Sanglee Park
Jungmin So
spellingShingle Sanglee Park
Jungmin So
On the Effectiveness of Adversarial Training in Defending against Adversarial Example Attacks for Image Classification
Applied Sciences
image classification
adversarial example attacks
adversarial training
MNIST
author_facet Sanglee Park
Jungmin So
author_sort Sanglee Park
title On the Effectiveness of Adversarial Training in Defending against Adversarial Example Attacks for Image Classification
title_short On the Effectiveness of Adversarial Training in Defending against Adversarial Example Attacks for Image Classification
title_full On the Effectiveness of Adversarial Training in Defending against Adversarial Example Attacks for Image Classification
title_fullStr On the Effectiveness of Adversarial Training in Defending against Adversarial Example Attacks for Image Classification
title_full_unstemmed On the Effectiveness of Adversarial Training in Defending against Adversarial Example Attacks for Image Classification
title_sort on the effectiveness of adversarial training in defending against adversarial example attacks for image classification
publisher MDPI AG
series Applied Sciences
issn 2076-3417
publishDate 2020-11-01
description State-of-the-art neural network models are actively used in various fields, but it is well-known that they are vulnerable to adversarial example attacks. Throughout the efforts to make the models robust against adversarial example attacks, it has been found to be a very difficult task. While many defense approaches were shown to be not effective, adversarial training remains as one of the promising methods. In adversarial training, the training data are augmented by “adversarial” samples generated using an attack algorithm. If the attacker uses a similar attack algorithm to generate adversarial examples, the adversarially trained network can be quite robust to the attack. However, there are numerous ways of creating adversarial examples, and the defender does not know what algorithm the attacker may use. A natural question is: Can we use adversarial training to train a model robust to multiple types of attack? Previous work have shown that, when a network is trained with adversarial examples generated from multiple attack methods, the network is still vulnerable to white-box attacks where the attacker has complete access to the model parameters. In this paper, we study this question in the context of black-box attacks, which can be a more realistic assumption for practical applications. Experiments with the MNIST dataset show that adversarially training a network with an attack method helps defending against that particular attack method, but has limited effect for other attack methods. In addition, even if the defender trains a network with multiple types of adversarial examples and the attacker attacks with one of the methods, the network could lose accuracy to the attack if the attacker uses a different data augmentation strategy on the target network. These results show that it is very difficult to make a robust network using adversarial training, even for black-box settings where the attacker has restricted information on the target network.
topic image classification
adversarial example attacks
adversarial training
MNIST
url https://www.mdpi.com/2076-3417/10/22/8079
work_keys_str_mv AT sangleepark ontheeffectivenessofadversarialtrainingindefendingagainstadversarialexampleattacksforimageclassification
AT jungminso ontheeffectivenessofadversarialtrainingindefendingagainstadversarialexampleattacksforimageclassification
_version_ 1724427332703223808