PDRCNN: Precise Phishing Detection with Recurrent Convolutional Neural Networks

Through well-designed counterfeit websites, phishing induces online users to visit forged web pages to obtain their private sensitive information, e.g., account number and password. Existing antiphishing approaches are mostly based on page-related features, which require to crawl content of web page...

Full description

Bibliographic Details
Main Authors:	Weiping Wang, Feng Zhang, Xi Luo, Shigeng Zhang
Format:	Article
Language:	English
Published:	Hindawi-Wiley 2019-01-01
Series:	Security and Communication Networks
Online Access:	http://dx.doi.org/10.1155/2019/2595794

id	doaj-4f4f83c3f4ba4709930114ee53982734
record_format	Article
spelling	doaj-4f4f83c3f4ba4709930114ee539827342020-11-24T21:50:32ZengHindawi-WileySecurity and Communication Networks1939-01141939-01222019-01-01201910.1155/2019/25957942595794PDRCNN: Precise Phishing Detection with Recurrent Convolutional Neural NetworksWeiping Wang0Feng Zhang1Xi Luo2Shigeng Zhang3School of Computer Science and Engineering, Central South University, Changsha, ChinaSchool of Computer Science and Engineering, Central South University, Changsha, ChinaHunan Provincial Key Laboratory of Network Investigational Technology and Department of Information Technology, Hunan Police Academy, Changsha, ChinaSchool of Computer Science and Engineering, Central South University, Changsha, ChinaThrough well-designed counterfeit websites, phishing induces online users to visit forged web pages to obtain their private sensitive information, e.g., account number and password. Existing antiphishing approaches are mostly based on page-related features, which require to crawl content of web pages as well as accessing third-party search engines or DNS services. This not only leads to their low efficiency in detecting phishing but also makes them rely on network environment and third-party services heavily. In this paper, we propose a fast phishing website detection approach called PDRCNN that relies only on the URL of the website. PDRCNN neither needs to retrieve content of the target website nor uses any third-party services as previous approaches do. It encodes the information of an URL into a two-dimensional tensor and feeds the tensor into a novelly designed deep learning neural network to classify the original URL. We first use a bidirectional LSTM network to extract global features of the constructed tensor and give all string information to each character in the URL. After that, we use a CNN to automatically judge which characters play key roles in phishing detection, capture the key components of the URL, and compress the extracted features into a fixed length vector space. By combining the two types of networks, PDRCNN achieves better performance than just using either one of them. We built a dataset containing nearly 500,000 URLs which are obtained through Alexa and PhishTank. Experimental results show that PDRCNN achieves a detection accuracy of 97% and an AUC value of 99%, which is much better than state-of-the-art approaches. Furthermore, the recognition process is very fast: on the trained PDRCNN model, the average per URL detection time only cost 0.4 ms.http://dx.doi.org/10.1155/2019/2595794
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Weiping Wang Feng Zhang Xi Luo Shigeng Zhang
spellingShingle	Weiping Wang Feng Zhang Xi Luo Shigeng Zhang PDRCNN: Precise Phishing Detection with Recurrent Convolutional Neural Networks Security and Communication Networks
author_facet	Weiping Wang Feng Zhang Xi Luo Shigeng Zhang
author_sort	Weiping Wang
title	PDRCNN: Precise Phishing Detection with Recurrent Convolutional Neural Networks
title_short	PDRCNN: Precise Phishing Detection with Recurrent Convolutional Neural Networks
title_full	PDRCNN: Precise Phishing Detection with Recurrent Convolutional Neural Networks
title_fullStr	PDRCNN: Precise Phishing Detection with Recurrent Convolutional Neural Networks
title_full_unstemmed	PDRCNN: Precise Phishing Detection with Recurrent Convolutional Neural Networks
title_sort	pdrcnn: precise phishing detection with recurrent convolutional neural networks
publisher	Hindawi-Wiley
series	Security and Communication Networks
issn	1939-0114 1939-0122
publishDate	2019-01-01
description	Through well-designed counterfeit websites, phishing induces online users to visit forged web pages to obtain their private sensitive information, e.g., account number and password. Existing antiphishing approaches are mostly based on page-related features, which require to crawl content of web pages as well as accessing third-party search engines or DNS services. This not only leads to their low efficiency in detecting phishing but also makes them rely on network environment and third-party services heavily. In this paper, we propose a fast phishing website detection approach called PDRCNN that relies only on the URL of the website. PDRCNN neither needs to retrieve content of the target website nor uses any third-party services as previous approaches do. It encodes the information of an URL into a two-dimensional tensor and feeds the tensor into a novelly designed deep learning neural network to classify the original URL. We first use a bidirectional LSTM network to extract global features of the constructed tensor and give all string information to each character in the URL. After that, we use a CNN to automatically judge which characters play key roles in phishing detection, capture the key components of the URL, and compress the extracted features into a fixed length vector space. By combining the two types of networks, PDRCNN achieves better performance than just using either one of them. We built a dataset containing nearly 500,000 URLs which are obtained through Alexa and PhishTank. Experimental results show that PDRCNN achieves a detection accuracy of 97% and an AUC value of 99%, which is much better than state-of-the-art approaches. Furthermore, the recognition process is very fast: on the trained PDRCNN model, the average per URL detection time only cost 0.4 ms.
url	http://dx.doi.org/10.1155/2019/2595794
work_keys_str_mv	AT weipingwang pdrcnnprecisephishingdetectionwithrecurrentconvolutionalneuralnetworks AT fengzhang pdrcnnprecisephishingdetectionwithrecurrentconvolutionalneuralnetworks AT xiluo pdrcnnprecisephishingdetectionwithrecurrentconvolutionalneuralnetworks AT shigengzhang pdrcnnprecisephishingdetectionwithrecurrentconvolutionalneuralnetworks
_version_	1725883337686384640

PDRCNN: Precise Phishing Detection with Recurrent Convolutional Neural Networks

Similar Items