PDRCNN: Precise Phishing Detection with Recurrent Convolutional Neural Networks

Through well-designed counterfeit websites, phishing induces online users to visit forged web pages to obtain their private sensitive information, e.g., account number and password. Existing antiphishing approaches are mostly based on page-related features, which require to crawl content of web page...

Full description

Bibliographic Details
Main Authors: Weiping Wang, Feng Zhang, Xi Luo, Shigeng Zhang
Format: Article
Language:English
Published: Hindawi-Wiley 2019-01-01
Series:Security and Communication Networks
Online Access:http://dx.doi.org/10.1155/2019/2595794
id doaj-4f4f83c3f4ba4709930114ee53982734
record_format Article
spelling doaj-4f4f83c3f4ba4709930114ee539827342020-11-24T21:50:32ZengHindawi-WileySecurity and Communication Networks1939-01141939-01222019-01-01201910.1155/2019/25957942595794PDRCNN: Precise Phishing Detection with Recurrent Convolutional Neural NetworksWeiping Wang0Feng Zhang1Xi Luo2Shigeng Zhang3School of Computer Science and Engineering, Central South University, Changsha, ChinaSchool of Computer Science and Engineering, Central South University, Changsha, ChinaHunan Provincial Key Laboratory of Network Investigational Technology and Department of Information Technology, Hunan Police Academy, Changsha, ChinaSchool of Computer Science and Engineering, Central South University, Changsha, ChinaThrough well-designed counterfeit websites, phishing induces online users to visit forged web pages to obtain their private sensitive information, e.g., account number and password. Existing antiphishing approaches are mostly based on page-related features, which require to crawl content of web pages as well as accessing third-party search engines or DNS services. This not only leads to their low efficiency in detecting phishing but also makes them rely on network environment and third-party services heavily. In this paper, we propose a fast phishing website detection approach called PDRCNN that relies only on the URL of the website. PDRCNN neither needs to retrieve content of the target website nor uses any third-party services as previous approaches do. It encodes the information of an URL into a two-dimensional tensor and feeds the tensor into a novelly designed deep learning neural network to classify the original URL. We first use a bidirectional LSTM network to extract global features of the constructed tensor and give all string information to each character in the URL. After that, we use a CNN to automatically judge which characters play key roles in phishing detection, capture the key components of the URL, and compress the extracted features into a fixed length vector space. By combining the two types of networks, PDRCNN achieves better performance than just using either one of them. We built a dataset containing nearly 500,000 URLs which are obtained through Alexa and PhishTank. Experimental results show that PDRCNN achieves a detection accuracy of 97% and an AUC value of 99%, which is much better than state-of-the-art approaches. Furthermore, the recognition process is very fast: on the trained PDRCNN model, the average per URL detection time only cost 0.4 ms.http://dx.doi.org/10.1155/2019/2595794
collection DOAJ
language English
format Article
sources DOAJ
author Weiping Wang
Feng Zhang
Xi Luo
Shigeng Zhang
spellingShingle Weiping Wang
Feng Zhang
Xi Luo
Shigeng Zhang
PDRCNN: Precise Phishing Detection with Recurrent Convolutional Neural Networks
Security and Communication Networks
author_facet Weiping Wang
Feng Zhang
Xi Luo
Shigeng Zhang
author_sort Weiping Wang
title PDRCNN: Precise Phishing Detection with Recurrent Convolutional Neural Networks
title_short PDRCNN: Precise Phishing Detection with Recurrent Convolutional Neural Networks
title_full PDRCNN: Precise Phishing Detection with Recurrent Convolutional Neural Networks
title_fullStr PDRCNN: Precise Phishing Detection with Recurrent Convolutional Neural Networks
title_full_unstemmed PDRCNN: Precise Phishing Detection with Recurrent Convolutional Neural Networks
title_sort pdrcnn: precise phishing detection with recurrent convolutional neural networks
publisher Hindawi-Wiley
series Security and Communication Networks
issn 1939-0114
1939-0122
publishDate 2019-01-01
description Through well-designed counterfeit websites, phishing induces online users to visit forged web pages to obtain their private sensitive information, e.g., account number and password. Existing antiphishing approaches are mostly based on page-related features, which require to crawl content of web pages as well as accessing third-party search engines or DNS services. This not only leads to their low efficiency in detecting phishing but also makes them rely on network environment and third-party services heavily. In this paper, we propose a fast phishing website detection approach called PDRCNN that relies only on the URL of the website. PDRCNN neither needs to retrieve content of the target website nor uses any third-party services as previous approaches do. It encodes the information of an URL into a two-dimensional tensor and feeds the tensor into a novelly designed deep learning neural network to classify the original URL. We first use a bidirectional LSTM network to extract global features of the constructed tensor and give all string information to each character in the URL. After that, we use a CNN to automatically judge which characters play key roles in phishing detection, capture the key components of the URL, and compress the extracted features into a fixed length vector space. By combining the two types of networks, PDRCNN achieves better performance than just using either one of them. We built a dataset containing nearly 500,000 URLs which are obtained through Alexa and PhishTank. Experimental results show that PDRCNN achieves a detection accuracy of 97% and an AUC value of 99%, which is much better than state-of-the-art approaches. Furthermore, the recognition process is very fast: on the trained PDRCNN model, the average per URL detection time only cost 0.4 ms.
url http://dx.doi.org/10.1155/2019/2595794
work_keys_str_mv AT weipingwang pdrcnnprecisephishingdetectionwithrecurrentconvolutionalneuralnetworks
AT fengzhang pdrcnnprecisephishingdetectionwithrecurrentconvolutionalneuralnetworks
AT xiluo pdrcnnprecisephishingdetectionwithrecurrentconvolutionalneuralnetworks
AT shigengzhang pdrcnnprecisephishingdetectionwithrecurrentconvolutionalneuralnetworks
_version_ 1725883337686384640