Early detection of malicious web content with applied machine learning

This thesis explores the use of applied machine learning techniques to augment traditional methods of identifying and preventing web-based attacks. Several factors complicate the identification of web-based attacks. The first is the scale of the web. The amount of...

Full description

Bibliographic Details
Main Author:	Likarish, Peter F.
Other Authors:	Jung, Eunjin
Format:	Others
Language:	English
Published:	University of Iowa 2011
Subjects:	applied machine learning Computer security Domain Name System javascript phishing web-based attacks Computer Sciences
Online Access:	https://ir.uiowa.edu/etd/4871 https://ir.uiowa.edu/cgi/viewcontent.cgi?article=4912&context=etd

id	ndltd-uiowa.edu-oai-ir.uiowa.edu-etd-4912
record_format	oai_dc
spelling	ndltd-uiowa.edu-oai-ir.uiowa.edu-etd-49122019-10-13T05:04:14Z Early detection of malicious web content with applied machine learning Likarish, Peter F. This thesis explores the use of applied machine learning techniques to augment traditional methods of identifying and preventing web-based attacks. Several factors complicate the identification of web-based attacks. The first is the scale of the web. The amount of data on the web and the heterogeneous nature of this data complicate efforts to distinguish between benign sites and attack sites. Second, an attacker may duplicate their attack at multiple, unexpected locations (multiple URLs spread across different domains) with ease. Third, attacks can be hosted nearly anonymously; there is little cost or risk associated with hosting or publishing a web-based attack. In combination, these factors lead one to conclude that, currently, the webs threat landscape is unfavorably tilted towards the attacker. To counter these advantages this thesis describes our novel solutions to web se- curity problems. The common theme running through our work is the demonstration that we can detect attacks missed by other security tools as well as detecting attacks sooner than other security responses. To illustrate this, we describe the development of BayeShield, a browser-based tool capable of successfully identifying phishing at- tacks in the wild. Progressing from specific to a more general approach, we next focus on the detection of obfuscated scripts (one of the most commonly used tools in web-based attacks). Finally, we present TopSpector, a system we've designed to forecast malicious activity prior to it's occurrence. We demonstrate that by mining Top-Level DNS data we can produce a candidate set of domains that contains up to 65% of domains that will be blacklisted. Furthermore, on average TopSpector flags malicious domains 32 days before they are blacklisted, allowing the security community ample time to investigate these domains before they host malicious activity. 2011-07-01T07:00:00Z dissertation application/pdf https://ir.uiowa.edu/etd/4871 https://ir.uiowa.edu/cgi/viewcontent.cgi?article=4912&context=etd Copyright 2011 Peter Likarish Theses and Dissertations eng University of IowaJung, Eunjin applied machine learning Computer security Domain Name System javascript phishing web-based attacks Computer Sciences
collection	NDLTD
language	English
format	Others
sources	NDLTD
topic	applied machine learning Computer security Domain Name System javascript phishing web-based attacks Computer Sciences
spellingShingle	applied machine learning Computer security Domain Name System javascript phishing web-based attacks Computer Sciences Likarish, Peter F. Early detection of malicious web content with applied machine learning
description	This thesis explores the use of applied machine learning techniques to augment traditional methods of identifying and preventing web-based attacks. Several factors complicate the identification of web-based attacks. The first is the scale of the web. The amount of data on the web and the heterogeneous nature of this data complicate efforts to distinguish between benign sites and attack sites. Second, an attacker may duplicate their attack at multiple, unexpected locations (multiple URLs spread across different domains) with ease. Third, attacks can be hosted nearly anonymously; there is little cost or risk associated with hosting or publishing a web-based attack. In combination, these factors lead one to conclude that, currently, the webs threat landscape is unfavorably tilted towards the attacker. To counter these advantages this thesis describes our novel solutions to web se- curity problems. The common theme running through our work is the demonstration that we can detect attacks missed by other security tools as well as detecting attacks sooner than other security responses. To illustrate this, we describe the development of BayeShield, a browser-based tool capable of successfully identifying phishing at- tacks in the wild. Progressing from specific to a more general approach, we next focus on the detection of obfuscated scripts (one of the most commonly used tools in web-based attacks). Finally, we present TopSpector, a system we've designed to forecast malicious activity prior to it's occurrence. We demonstrate that by mining Top-Level DNS data we can produce a candidate set of domains that contains up to 65% of domains that will be blacklisted. Furthermore, on average TopSpector flags malicious domains 32 days before they are blacklisted, allowing the security community ample time to investigate these domains before they host malicious activity.
author2	Jung, Eunjin
author_facet	Jung, Eunjin Likarish, Peter F.
author	Likarish, Peter F.
author_sort	Likarish, Peter F.
title	Early detection of malicious web content with applied machine learning
title_short	Early detection of malicious web content with applied machine learning
title_full	Early detection of malicious web content with applied machine learning
title_fullStr	Early detection of malicious web content with applied machine learning
title_full_unstemmed	Early detection of malicious web content with applied machine learning
title_sort	early detection of malicious web content with applied machine learning
publisher	University of Iowa
publishDate	2011
url	https://ir.uiowa.edu/etd/4871 https://ir.uiowa.edu/cgi/viewcontent.cgi?article=4912&context=etd
work_keys_str_mv	AT likarishpeterf earlydetectionofmaliciouswebcontentwithappliedmachinelearning
_version_	1719265638598311936

Early detection of malicious web content with applied machine learning

Similar Items