Early detection of malicious web content with applied machine learning

This thesis explores the use of applied machine learning techniques to augment traditional methods of identifying and preventing web-based attacks. Several factors complicate the identification of web-based attacks. The first is the scale of the web. The amount of...

Full description

Bibliographic Details
Main Author: Likarish, Peter F.
Other Authors: Jung, Eunjin
Format: Others
Language:English
Published: University of Iowa 2011
Subjects:
Online Access:https://ir.uiowa.edu/etd/4871
https://ir.uiowa.edu/cgi/viewcontent.cgi?article=4912&context=etd
id ndltd-uiowa.edu-oai-ir.uiowa.edu-etd-4912
record_format oai_dc
spelling ndltd-uiowa.edu-oai-ir.uiowa.edu-etd-49122019-10-13T05:04:14Z Early detection of malicious web content with applied machine learning Likarish, Peter F. This thesis explores the use of applied machine learning techniques to augment traditional methods of identifying and preventing web-based attacks. Several factors complicate the identification of web-based attacks. The first is the scale of the web. The amount of data on the web and the heterogeneous nature of this data complicate efforts to distinguish between benign sites and attack sites. Second, an attacker may duplicate their attack at multiple, unexpected locations (multiple URLs spread across different domains) with ease. Third, attacks can be hosted nearly anonymously; there is little cost or risk associated with hosting or publishing a web-based attack. In combination, these factors lead one to conclude that, currently, the webs threat landscape is unfavorably tilted towards the attacker. To counter these advantages this thesis describes our novel solutions to web se- curity problems. The common theme running through our work is the demonstration that we can detect attacks missed by other security tools as well as detecting attacks sooner than other security responses. To illustrate this, we describe the development of BayeShield, a browser-based tool capable of successfully identifying phishing at- tacks in the wild. Progressing from specific to a more general approach, we next focus on the detection of obfuscated scripts (one of the most commonly used tools in web-based attacks). Finally, we present TopSpector, a system we've designed to forecast malicious activity prior to it's occurrence. We demonstrate that by mining Top-Level DNS data we can produce a candidate set of domains that contains up to 65% of domains that will be blacklisted. Furthermore, on average TopSpector flags malicious domains 32 days before they are blacklisted, allowing the security community ample time to investigate these domains before they host malicious activity. 2011-07-01T07:00:00Z dissertation application/pdf https://ir.uiowa.edu/etd/4871 https://ir.uiowa.edu/cgi/viewcontent.cgi?article=4912&context=etd Copyright 2011 Peter Likarish Theses and Dissertations eng University of IowaJung, Eunjin applied machine learning Computer security Domain Name System javascript phishing web-based attacks Computer Sciences
collection NDLTD
language English
format Others
sources NDLTD
topic applied machine learning
Computer security
Domain Name System
javascript
phishing
web-based attacks
Computer Sciences
spellingShingle applied machine learning
Computer security
Domain Name System
javascript
phishing
web-based attacks
Computer Sciences
Likarish, Peter F.
Early detection of malicious web content with applied machine learning
description This thesis explores the use of applied machine learning techniques to augment traditional methods of identifying and preventing web-based attacks. Several factors complicate the identification of web-based attacks. The first is the scale of the web. The amount of data on the web and the heterogeneous nature of this data complicate efforts to distinguish between benign sites and attack sites. Second, an attacker may duplicate their attack at multiple, unexpected locations (multiple URLs spread across different domains) with ease. Third, attacks can be hosted nearly anonymously; there is little cost or risk associated with hosting or publishing a web-based attack. In combination, these factors lead one to conclude that, currently, the webs threat landscape is unfavorably tilted towards the attacker. To counter these advantages this thesis describes our novel solutions to web se- curity problems. The common theme running through our work is the demonstration that we can detect attacks missed by other security tools as well as detecting attacks sooner than other security responses. To illustrate this, we describe the development of BayeShield, a browser-based tool capable of successfully identifying phishing at- tacks in the wild. Progressing from specific to a more general approach, we next focus on the detection of obfuscated scripts (one of the most commonly used tools in web-based attacks). Finally, we present TopSpector, a system we've designed to forecast malicious activity prior to it's occurrence. We demonstrate that by mining Top-Level DNS data we can produce a candidate set of domains that contains up to 65% of domains that will be blacklisted. Furthermore, on average TopSpector flags malicious domains 32 days before they are blacklisted, allowing the security community ample time to investigate these domains before they host malicious activity.
author2 Jung, Eunjin
author_facet Jung, Eunjin
Likarish, Peter F.
author Likarish, Peter F.
author_sort Likarish, Peter F.
title Early detection of malicious web content with applied machine learning
title_short Early detection of malicious web content with applied machine learning
title_full Early detection of malicious web content with applied machine learning
title_fullStr Early detection of malicious web content with applied machine learning
title_full_unstemmed Early detection of malicious web content with applied machine learning
title_sort early detection of malicious web content with applied machine learning
publisher University of Iowa
publishDate 2011
url https://ir.uiowa.edu/etd/4871
https://ir.uiowa.edu/cgi/viewcontent.cgi?article=4912&context=etd
work_keys_str_mv AT likarishpeterf earlydetectionofmaliciouswebcontentwithappliedmachinelearning
_version_ 1719265638598311936