PDRCNN: Precise Phishing Detection with Recurrent Convolutional Neural Networks
Through well-designed counterfeit websites, phishing induces online users to visit forged web pages to obtain their private sensitive information, e.g., account number and password. Existing antiphishing approaches are mostly based on page-related features, which require to crawl content of web page...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Hindawi-Wiley
2019-01-01
|
Series: | Security and Communication Networks |
Online Access: | http://dx.doi.org/10.1155/2019/2595794 |
id |
doaj-4f4f83c3f4ba4709930114ee53982734 |
---|---|
record_format |
Article |
spelling |
doaj-4f4f83c3f4ba4709930114ee539827342020-11-24T21:50:32ZengHindawi-WileySecurity and Communication Networks1939-01141939-01222019-01-01201910.1155/2019/25957942595794PDRCNN: Precise Phishing Detection with Recurrent Convolutional Neural NetworksWeiping Wang0Feng Zhang1Xi Luo2Shigeng Zhang3School of Computer Science and Engineering, Central South University, Changsha, ChinaSchool of Computer Science and Engineering, Central South University, Changsha, ChinaHunan Provincial Key Laboratory of Network Investigational Technology and Department of Information Technology, Hunan Police Academy, Changsha, ChinaSchool of Computer Science and Engineering, Central South University, Changsha, ChinaThrough well-designed counterfeit websites, phishing induces online users to visit forged web pages to obtain their private sensitive information, e.g., account number and password. Existing antiphishing approaches are mostly based on page-related features, which require to crawl content of web pages as well as accessing third-party search engines or DNS services. This not only leads to their low efficiency in detecting phishing but also makes them rely on network environment and third-party services heavily. In this paper, we propose a fast phishing website detection approach called PDRCNN that relies only on the URL of the website. PDRCNN neither needs to retrieve content of the target website nor uses any third-party services as previous approaches do. It encodes the information of an URL into a two-dimensional tensor and feeds the tensor into a novelly designed deep learning neural network to classify the original URL. We first use a bidirectional LSTM network to extract global features of the constructed tensor and give all string information to each character in the URL. After that, we use a CNN to automatically judge which characters play key roles in phishing detection, capture the key components of the URL, and compress the extracted features into a fixed length vector space. By combining the two types of networks, PDRCNN achieves better performance than just using either one of them. We built a dataset containing nearly 500,000 URLs which are obtained through Alexa and PhishTank. Experimental results show that PDRCNN achieves a detection accuracy of 97% and an AUC value of 99%, which is much better than state-of-the-art approaches. Furthermore, the recognition process is very fast: on the trained PDRCNN model, the average per URL detection time only cost 0.4 ms.http://dx.doi.org/10.1155/2019/2595794 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Weiping Wang Feng Zhang Xi Luo Shigeng Zhang |
spellingShingle |
Weiping Wang Feng Zhang Xi Luo Shigeng Zhang PDRCNN: Precise Phishing Detection with Recurrent Convolutional Neural Networks Security and Communication Networks |
author_facet |
Weiping Wang Feng Zhang Xi Luo Shigeng Zhang |
author_sort |
Weiping Wang |
title |
PDRCNN: Precise Phishing Detection with Recurrent Convolutional Neural Networks |
title_short |
PDRCNN: Precise Phishing Detection with Recurrent Convolutional Neural Networks |
title_full |
PDRCNN: Precise Phishing Detection with Recurrent Convolutional Neural Networks |
title_fullStr |
PDRCNN: Precise Phishing Detection with Recurrent Convolutional Neural Networks |
title_full_unstemmed |
PDRCNN: Precise Phishing Detection with Recurrent Convolutional Neural Networks |
title_sort |
pdrcnn: precise phishing detection with recurrent convolutional neural networks |
publisher |
Hindawi-Wiley |
series |
Security and Communication Networks |
issn |
1939-0114 1939-0122 |
publishDate |
2019-01-01 |
description |
Through well-designed counterfeit websites, phishing induces online users to visit forged web pages to obtain their private sensitive information, e.g., account number and password. Existing antiphishing approaches are mostly based on page-related features, which require to crawl content of web pages as well as accessing third-party search engines or DNS services. This not only leads to their low efficiency in detecting phishing but also makes them rely on network environment and third-party services heavily. In this paper, we propose a fast phishing website detection approach called PDRCNN that relies only on the URL of the website. PDRCNN neither needs to retrieve content of the target website nor uses any third-party services as previous approaches do. It encodes the information of an URL into a two-dimensional tensor and feeds the tensor into a novelly designed deep learning neural network to classify the original URL. We first use a bidirectional LSTM network to extract global features of the constructed tensor and give all string information to each character in the URL. After that, we use a CNN to automatically judge which characters play key roles in phishing detection, capture the key components of the URL, and compress the extracted features into a fixed length vector space. By combining the two types of networks, PDRCNN achieves better performance than just using either one of them. We built a dataset containing nearly 500,000 URLs which are obtained through Alexa and PhishTank. Experimental results show that PDRCNN achieves a detection accuracy of 97% and an AUC value of 99%, which is much better than state-of-the-art approaches. Furthermore, the recognition process is very fast: on the trained PDRCNN model, the average per URL detection time only cost 0.4 ms. |
url |
http://dx.doi.org/10.1155/2019/2595794 |
work_keys_str_mv |
AT weipingwang pdrcnnprecisephishingdetectionwithrecurrentconvolutionalneuralnetworks AT fengzhang pdrcnnprecisephishingdetectionwithrecurrentconvolutionalneuralnetworks AT xiluo pdrcnnprecisephishingdetectionwithrecurrentconvolutionalneuralnetworks AT shigengzhang pdrcnnprecisephishingdetectionwithrecurrentconvolutionalneuralnetworks |
_version_ |
1725883337686384640 |