On the Effectiveness of Adversarial Training in Defending against Adversarial Example Attacks for Image Classification
State-of-the-art neural network models are actively used in various fields, but it is well-known that they are vulnerable to adversarial example attacks. Throughout the efforts to make the models robust against adversarial example attacks, it has been found to be a very difficult task. While many de...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2020-11-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/10/22/8079 |
id |
doaj-1017460527d34ac89c1bfa30d6062c91 |
---|---|
record_format |
Article |
spelling |
doaj-1017460527d34ac89c1bfa30d6062c912020-11-25T04:07:56ZengMDPI AGApplied Sciences2076-34172020-11-01108079807910.3390/app10228079On the Effectiveness of Adversarial Training in Defending against Adversarial Example Attacks for Image ClassificationSanglee Park0Jungmin So1Department of Computer Science and Engineering, Sogang University, Seoul 04107, KoreaDepartment of Computer Science and Engineering, Sogang University, Seoul 04107, KoreaState-of-the-art neural network models are actively used in various fields, but it is well-known that they are vulnerable to adversarial example attacks. Throughout the efforts to make the models robust against adversarial example attacks, it has been found to be a very difficult task. While many defense approaches were shown to be not effective, adversarial training remains as one of the promising methods. In adversarial training, the training data are augmented by “adversarial” samples generated using an attack algorithm. If the attacker uses a similar attack algorithm to generate adversarial examples, the adversarially trained network can be quite robust to the attack. However, there are numerous ways of creating adversarial examples, and the defender does not know what algorithm the attacker may use. A natural question is: Can we use adversarial training to train a model robust to multiple types of attack? Previous work have shown that, when a network is trained with adversarial examples generated from multiple attack methods, the network is still vulnerable to white-box attacks where the attacker has complete access to the model parameters. In this paper, we study this question in the context of black-box attacks, which can be a more realistic assumption for practical applications. Experiments with the MNIST dataset show that adversarially training a network with an attack method helps defending against that particular attack method, but has limited effect for other attack methods. In addition, even if the defender trains a network with multiple types of adversarial examples and the attacker attacks with one of the methods, the network could lose accuracy to the attack if the attacker uses a different data augmentation strategy on the target network. These results show that it is very difficult to make a robust network using adversarial training, even for black-box settings where the attacker has restricted information on the target network.https://www.mdpi.com/2076-3417/10/22/8079image classificationadversarial example attacksadversarial trainingMNIST |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Sanglee Park Jungmin So |
spellingShingle |
Sanglee Park Jungmin So On the Effectiveness of Adversarial Training in Defending against Adversarial Example Attacks for Image Classification Applied Sciences image classification adversarial example attacks adversarial training MNIST |
author_facet |
Sanglee Park Jungmin So |
author_sort |
Sanglee Park |
title |
On the Effectiveness of Adversarial Training in Defending against Adversarial Example Attacks for Image Classification |
title_short |
On the Effectiveness of Adversarial Training in Defending against Adversarial Example Attacks for Image Classification |
title_full |
On the Effectiveness of Adversarial Training in Defending against Adversarial Example Attacks for Image Classification |
title_fullStr |
On the Effectiveness of Adversarial Training in Defending against Adversarial Example Attacks for Image Classification |
title_full_unstemmed |
On the Effectiveness of Adversarial Training in Defending against Adversarial Example Attacks for Image Classification |
title_sort |
on the effectiveness of adversarial training in defending against adversarial example attacks for image classification |
publisher |
MDPI AG |
series |
Applied Sciences |
issn |
2076-3417 |
publishDate |
2020-11-01 |
description |
State-of-the-art neural network models are actively used in various fields, but it is well-known that they are vulnerable to adversarial example attacks. Throughout the efforts to make the models robust against adversarial example attacks, it has been found to be a very difficult task. While many defense approaches were shown to be not effective, adversarial training remains as one of the promising methods. In adversarial training, the training data are augmented by “adversarial” samples generated using an attack algorithm. If the attacker uses a similar attack algorithm to generate adversarial examples, the adversarially trained network can be quite robust to the attack. However, there are numerous ways of creating adversarial examples, and the defender does not know what algorithm the attacker may use. A natural question is: Can we use adversarial training to train a model robust to multiple types of attack? Previous work have shown that, when a network is trained with adversarial examples generated from multiple attack methods, the network is still vulnerable to white-box attacks where the attacker has complete access to the model parameters. In this paper, we study this question in the context of black-box attacks, which can be a more realistic assumption for practical applications. Experiments with the MNIST dataset show that adversarially training a network with an attack method helps defending against that particular attack method, but has limited effect for other attack methods. In addition, even if the defender trains a network with multiple types of adversarial examples and the attacker attacks with one of the methods, the network could lose accuracy to the attack if the attacker uses a different data augmentation strategy on the target network. These results show that it is very difficult to make a robust network using adversarial training, even for black-box settings where the attacker has restricted information on the target network. |
topic |
image classification adversarial example attacks adversarial training MNIST |
url |
https://www.mdpi.com/2076-3417/10/22/8079 |
work_keys_str_mv |
AT sangleepark ontheeffectivenessofadversarialtrainingindefendingagainstadversarialexampleattacksforimageclassification AT jungminso ontheeffectivenessofadversarialtrainingindefendingagainstadversarialexampleattacksforimageclassification |
_version_ |
1724427332703223808 |