Detection of Algorithmically Generated Domain Names Using the Recurrent Convolutional Neural Network with Spatial Pyramid Pooling
Domain generation algorithms (DGAs) use specific parameters as random seeds to generate a large number of random domain names to prevent malicious domain name detection. This greatly increases the difficulty of detecting and defending against botnets and malware. Traditional models for detecting alg...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2020-09-01
|
Series: | Entropy |
Subjects: | |
Online Access: | https://www.mdpi.com/1099-4300/22/9/1058 |
id |
doaj-0e32a6e03ba64e6885af1e305c6aed73 |
---|---|
record_format |
Article |
spelling |
doaj-0e32a6e03ba64e6885af1e305c6aed732020-11-25T03:41:58ZengMDPI AGEntropy1099-43002020-09-01221058105810.3390/e22091058Detection of Algorithmically Generated Domain Names Using the Recurrent Convolutional Neural Network with Spatial Pyramid PoolingZhanghui Liu0Yudong Zhang1Yuzhong Chen2Xinwen Fan3Chen Dong4Fujian Key Laboratory of Network Computing and Intelligent Information Processing, College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350116, ChinaFujian Key Laboratory of Network Computing and Intelligent Information Processing, College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350116, ChinaFujian Key Laboratory of Network Computing and Intelligent Information Processing, College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350116, ChinaFujian Key Laboratory of Network Computing and Intelligent Information Processing, College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350116, ChinaFujian Key Laboratory of Network Computing and Intelligent Information Processing, College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350116, ChinaDomain generation algorithms (DGAs) use specific parameters as random seeds to generate a large number of random domain names to prevent malicious domain name detection. This greatly increases the difficulty of detecting and defending against botnets and malware. Traditional models for detecting algorithmically generated domain names generally rely on manually extracting statistical characteristics from the domain names or network traffic and then employing classifiers to distinguish the algorithmically generated domain names. These models always require labor intensive manual feature engineering. In contrast, most state-of-the-art models based on deep neural networks are sensitive to imbalance in the sample distribution and cannot fully exploit the discriminative class features in domain names or network traffic, leading to decreased detection accuracy. To address these issues, we employ the borderline synthetic minority over-sampling algorithm (SMOTE) to improve sample balance. We also propose a recurrent convolutional neural network with spatial pyramid pooling (RCNN-SPP) to extract discriminative and distinctive class features. The recurrent convolutional neural network combines a convolutional neural network (CNN) and a bi-directional long short-term memory network (Bi-LSTM) to extract both the semantic and contextual information from domain names. We then employ the spatial pyramid pooling strategy to refine the contextual representation by capturing multi-scale contextual information from domain names. The experimental results from different domain name datasets demonstrate that our model can achieve 92.36% accuracy, an 89.55% recall rate, a 90.46% F1-score, and 95.39% AUC in identifying DGA and legitimate domain names, and it can achieve 92.45% accuracy rate, a 90.12% recall rate, a 90.86% F1-score, and 96.59% AUC in multi-classification problems. It achieves significant improvement over existing models in terms of accuracy and robustness.https://www.mdpi.com/1099-4300/22/9/1058domain generation algorithmalgorithmically generated domain nameSMOTErecurrent convolutional neural networkspatial pyramid pooling |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Zhanghui Liu Yudong Zhang Yuzhong Chen Xinwen Fan Chen Dong |
spellingShingle |
Zhanghui Liu Yudong Zhang Yuzhong Chen Xinwen Fan Chen Dong Detection of Algorithmically Generated Domain Names Using the Recurrent Convolutional Neural Network with Spatial Pyramid Pooling Entropy domain generation algorithm algorithmically generated domain name SMOTE recurrent convolutional neural network spatial pyramid pooling |
author_facet |
Zhanghui Liu Yudong Zhang Yuzhong Chen Xinwen Fan Chen Dong |
author_sort |
Zhanghui Liu |
title |
Detection of Algorithmically Generated Domain Names Using the Recurrent Convolutional Neural Network with Spatial Pyramid Pooling |
title_short |
Detection of Algorithmically Generated Domain Names Using the Recurrent Convolutional Neural Network with Spatial Pyramid Pooling |
title_full |
Detection of Algorithmically Generated Domain Names Using the Recurrent Convolutional Neural Network with Spatial Pyramid Pooling |
title_fullStr |
Detection of Algorithmically Generated Domain Names Using the Recurrent Convolutional Neural Network with Spatial Pyramid Pooling |
title_full_unstemmed |
Detection of Algorithmically Generated Domain Names Using the Recurrent Convolutional Neural Network with Spatial Pyramid Pooling |
title_sort |
detection of algorithmically generated domain names using the recurrent convolutional neural network with spatial pyramid pooling |
publisher |
MDPI AG |
series |
Entropy |
issn |
1099-4300 |
publishDate |
2020-09-01 |
description |
Domain generation algorithms (DGAs) use specific parameters as random seeds to generate a large number of random domain names to prevent malicious domain name detection. This greatly increases the difficulty of detecting and defending against botnets and malware. Traditional models for detecting algorithmically generated domain names generally rely on manually extracting statistical characteristics from the domain names or network traffic and then employing classifiers to distinguish the algorithmically generated domain names. These models always require labor intensive manual feature engineering. In contrast, most state-of-the-art models based on deep neural networks are sensitive to imbalance in the sample distribution and cannot fully exploit the discriminative class features in domain names or network traffic, leading to decreased detection accuracy. To address these issues, we employ the borderline synthetic minority over-sampling algorithm (SMOTE) to improve sample balance. We also propose a recurrent convolutional neural network with spatial pyramid pooling (RCNN-SPP) to extract discriminative and distinctive class features. The recurrent convolutional neural network combines a convolutional neural network (CNN) and a bi-directional long short-term memory network (Bi-LSTM) to extract both the semantic and contextual information from domain names. We then employ the spatial pyramid pooling strategy to refine the contextual representation by capturing multi-scale contextual information from domain names. The experimental results from different domain name datasets demonstrate that our model can achieve 92.36% accuracy, an 89.55% recall rate, a 90.46% F1-score, and 95.39% AUC in identifying DGA and legitimate domain names, and it can achieve 92.45% accuracy rate, a 90.12% recall rate, a 90.86% F1-score, and 96.59% AUC in multi-classification problems. It achieves significant improvement over existing models in terms of accuracy and robustness. |
topic |
domain generation algorithm algorithmically generated domain name SMOTE recurrent convolutional neural network spatial pyramid pooling |
url |
https://www.mdpi.com/1099-4300/22/9/1058 |
work_keys_str_mv |
AT zhanghuiliu detectionofalgorithmicallygenerateddomainnamesusingtherecurrentconvolutionalneuralnetworkwithspatialpyramidpooling AT yudongzhang detectionofalgorithmicallygenerateddomainnamesusingtherecurrentconvolutionalneuralnetworkwithspatialpyramidpooling AT yuzhongchen detectionofalgorithmicallygenerateddomainnamesusingtherecurrentconvolutionalneuralnetworkwithspatialpyramidpooling AT xinwenfan detectionofalgorithmicallygenerateddomainnamesusingtherecurrentconvolutionalneuralnetworkwithspatialpyramidpooling AT chendong detectionofalgorithmicallygenerateddomainnamesusingtherecurrentconvolutionalneuralnetworkwithspatialpyramidpooling |
_version_ |
1724527149007765504 |