Detection of Algorithmically Generated Domain Names Using the Recurrent Convolutional Neural Network with Spatial Pyramid Pooling

Domain generation algorithms (DGAs) use specific parameters as random seeds to generate a large number of random domain names to prevent malicious domain name detection. This greatly increases the difficulty of detecting and defending against botnets and malware. Traditional models for detecting alg...

Full description

Bibliographic Details
Main Authors: Zhanghui Liu, Yudong Zhang, Yuzhong Chen, Xinwen Fan, Chen Dong
Format: Article
Language:English
Published: MDPI AG 2020-09-01
Series:Entropy
Subjects:
Online Access:https://www.mdpi.com/1099-4300/22/9/1058
id doaj-0e32a6e03ba64e6885af1e305c6aed73
record_format Article
spelling doaj-0e32a6e03ba64e6885af1e305c6aed732020-11-25T03:41:58ZengMDPI AGEntropy1099-43002020-09-01221058105810.3390/e22091058Detection of Algorithmically Generated Domain Names Using the Recurrent Convolutional Neural Network with Spatial Pyramid PoolingZhanghui Liu0Yudong Zhang1Yuzhong Chen2Xinwen Fan3Chen Dong4Fujian Key Laboratory of Network Computing and Intelligent Information Processing, College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350116, ChinaFujian Key Laboratory of Network Computing and Intelligent Information Processing, College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350116, ChinaFujian Key Laboratory of Network Computing and Intelligent Information Processing, College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350116, ChinaFujian Key Laboratory of Network Computing and Intelligent Information Processing, College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350116, ChinaFujian Key Laboratory of Network Computing and Intelligent Information Processing, College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350116, ChinaDomain generation algorithms (DGAs) use specific parameters as random seeds to generate a large number of random domain names to prevent malicious domain name detection. This greatly increases the difficulty of detecting and defending against botnets and malware. Traditional models for detecting algorithmically generated domain names generally rely on manually extracting statistical characteristics from the domain names or network traffic and then employing classifiers to distinguish the algorithmically generated domain names. These models always require labor intensive manual feature engineering. In contrast, most state-of-the-art models based on deep neural networks are sensitive to imbalance in the sample distribution and cannot fully exploit the discriminative class features in domain names or network traffic, leading to decreased detection accuracy. To address these issues, we employ the borderline synthetic minority over-sampling algorithm (SMOTE) to improve sample balance. We also propose a recurrent convolutional neural network with spatial pyramid pooling (RCNN-SPP) to extract discriminative and distinctive class features. The recurrent convolutional neural network combines a convolutional neural network (CNN) and a bi-directional long short-term memory network (Bi-LSTM) to extract both the semantic and contextual information from domain names. We then employ the spatial pyramid pooling strategy to refine the contextual representation by capturing multi-scale contextual information from domain names. The experimental results from different domain name datasets demonstrate that our model can achieve 92.36% accuracy, an 89.55% recall rate, a 90.46% F1-score, and 95.39% AUC in identifying DGA and legitimate domain names, and it can achieve 92.45% accuracy rate, a 90.12% recall rate, a 90.86% F1-score, and 96.59% AUC in multi-classification problems. It achieves significant improvement over existing models in terms of accuracy and robustness.https://www.mdpi.com/1099-4300/22/9/1058domain generation algorithmalgorithmically generated domain nameSMOTErecurrent convolutional neural networkspatial pyramid pooling
collection DOAJ
language English
format Article
sources DOAJ
author Zhanghui Liu
Yudong Zhang
Yuzhong Chen
Xinwen Fan
Chen Dong
spellingShingle Zhanghui Liu
Yudong Zhang
Yuzhong Chen
Xinwen Fan
Chen Dong
Detection of Algorithmically Generated Domain Names Using the Recurrent Convolutional Neural Network with Spatial Pyramid Pooling
Entropy
domain generation algorithm
algorithmically generated domain name
SMOTE
recurrent convolutional neural network
spatial pyramid pooling
author_facet Zhanghui Liu
Yudong Zhang
Yuzhong Chen
Xinwen Fan
Chen Dong
author_sort Zhanghui Liu
title Detection of Algorithmically Generated Domain Names Using the Recurrent Convolutional Neural Network with Spatial Pyramid Pooling
title_short Detection of Algorithmically Generated Domain Names Using the Recurrent Convolutional Neural Network with Spatial Pyramid Pooling
title_full Detection of Algorithmically Generated Domain Names Using the Recurrent Convolutional Neural Network with Spatial Pyramid Pooling
title_fullStr Detection of Algorithmically Generated Domain Names Using the Recurrent Convolutional Neural Network with Spatial Pyramid Pooling
title_full_unstemmed Detection of Algorithmically Generated Domain Names Using the Recurrent Convolutional Neural Network with Spatial Pyramid Pooling
title_sort detection of algorithmically generated domain names using the recurrent convolutional neural network with spatial pyramid pooling
publisher MDPI AG
series Entropy
issn 1099-4300
publishDate 2020-09-01
description Domain generation algorithms (DGAs) use specific parameters as random seeds to generate a large number of random domain names to prevent malicious domain name detection. This greatly increases the difficulty of detecting and defending against botnets and malware. Traditional models for detecting algorithmically generated domain names generally rely on manually extracting statistical characteristics from the domain names or network traffic and then employing classifiers to distinguish the algorithmically generated domain names. These models always require labor intensive manual feature engineering. In contrast, most state-of-the-art models based on deep neural networks are sensitive to imbalance in the sample distribution and cannot fully exploit the discriminative class features in domain names or network traffic, leading to decreased detection accuracy. To address these issues, we employ the borderline synthetic minority over-sampling algorithm (SMOTE) to improve sample balance. We also propose a recurrent convolutional neural network with spatial pyramid pooling (RCNN-SPP) to extract discriminative and distinctive class features. The recurrent convolutional neural network combines a convolutional neural network (CNN) and a bi-directional long short-term memory network (Bi-LSTM) to extract both the semantic and contextual information from domain names. We then employ the spatial pyramid pooling strategy to refine the contextual representation by capturing multi-scale contextual information from domain names. The experimental results from different domain name datasets demonstrate that our model can achieve 92.36% accuracy, an 89.55% recall rate, a 90.46% F1-score, and 95.39% AUC in identifying DGA and legitimate domain names, and it can achieve 92.45% accuracy rate, a 90.12% recall rate, a 90.86% F1-score, and 96.59% AUC in multi-classification problems. It achieves significant improvement over existing models in terms of accuracy and robustness.
topic domain generation algorithm
algorithmically generated domain name
SMOTE
recurrent convolutional neural network
spatial pyramid pooling
url https://www.mdpi.com/1099-4300/22/9/1058
work_keys_str_mv AT zhanghuiliu detectionofalgorithmicallygenerateddomainnamesusingtherecurrentconvolutionalneuralnetworkwithspatialpyramidpooling
AT yudongzhang detectionofalgorithmicallygenerateddomainnamesusingtherecurrentconvolutionalneuralnetworkwithspatialpyramidpooling
AT yuzhongchen detectionofalgorithmicallygenerateddomainnamesusingtherecurrentconvolutionalneuralnetworkwithspatialpyramidpooling
AT xinwenfan detectionofalgorithmicallygenerateddomainnamesusingtherecurrentconvolutionalneuralnetworkwithspatialpyramidpooling
AT chendong detectionofalgorithmicallygenerateddomainnamesusingtherecurrentconvolutionalneuralnetworkwithspatialpyramidpooling
_version_ 1724527149007765504