Early detection of malicious web content with applied machine learning
This thesis explores the use of applied machine learning techniques to augment traditional methods of identifying and preventing web-based attacks. Several factors complicate the identification of web-based attacks. The first is the scale of the web. The amount of...
Main Author: | |
---|---|
Other Authors: | |
Format: | Others |
Language: | English |
Published: |
University of Iowa
2011
|
Subjects: | |
Online Access: | https://ir.uiowa.edu/etd/4871 https://ir.uiowa.edu/cgi/viewcontent.cgi?article=4912&context=etd |
id |
ndltd-uiowa.edu-oai-ir.uiowa.edu-etd-4912 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-uiowa.edu-oai-ir.uiowa.edu-etd-49122019-10-13T05:04:14Z Early detection of malicious web content with applied machine learning Likarish, Peter F. This thesis explores the use of applied machine learning techniques to augment traditional methods of identifying and preventing web-based attacks. Several factors complicate the identification of web-based attacks. The first is the scale of the web. The amount of data on the web and the heterogeneous nature of this data complicate efforts to distinguish between benign sites and attack sites. Second, an attacker may duplicate their attack at multiple, unexpected locations (multiple URLs spread across different domains) with ease. Third, attacks can be hosted nearly anonymously; there is little cost or risk associated with hosting or publishing a web-based attack. In combination, these factors lead one to conclude that, currently, the webs threat landscape is unfavorably tilted towards the attacker. To counter these advantages this thesis describes our novel solutions to web se- curity problems. The common theme running through our work is the demonstration that we can detect attacks missed by other security tools as well as detecting attacks sooner than other security responses. To illustrate this, we describe the development of BayeShield, a browser-based tool capable of successfully identifying phishing at- tacks in the wild. Progressing from specific to a more general approach, we next focus on the detection of obfuscated scripts (one of the most commonly used tools in web-based attacks). Finally, we present TopSpector, a system we've designed to forecast malicious activity prior to it's occurrence. We demonstrate that by mining Top-Level DNS data we can produce a candidate set of domains that contains up to 65% of domains that will be blacklisted. Furthermore, on average TopSpector flags malicious domains 32 days before they are blacklisted, allowing the security community ample time to investigate these domains before they host malicious activity. 2011-07-01T07:00:00Z dissertation application/pdf https://ir.uiowa.edu/etd/4871 https://ir.uiowa.edu/cgi/viewcontent.cgi?article=4912&context=etd Copyright 2011 Peter Likarish Theses and Dissertations eng University of IowaJung, Eunjin applied machine learning Computer security Domain Name System javascript phishing web-based attacks Computer Sciences |
collection |
NDLTD |
language |
English |
format |
Others
|
sources |
NDLTD |
topic |
applied machine learning Computer security Domain Name System javascript phishing web-based attacks Computer Sciences |
spellingShingle |
applied machine learning Computer security Domain Name System javascript phishing web-based attacks Computer Sciences Likarish, Peter F. Early detection of malicious web content with applied machine learning |
description |
This thesis explores the use of applied machine learning techniques to augment traditional methods of identifying and preventing web-based attacks. Several factors complicate the identification of web-based attacks. The first is the scale of the web. The amount of data on the web and the heterogeneous nature of this data complicate efforts to distinguish between benign sites and attack sites. Second, an attacker may duplicate their attack at multiple, unexpected locations (multiple URLs spread across different domains) with ease. Third, attacks can be hosted nearly anonymously; there is little cost or risk associated with hosting or publishing a web-based attack. In combination, these factors lead one to conclude that, currently, the webs threat landscape is unfavorably tilted towards the attacker.
To counter these advantages this thesis describes our novel solutions to web se- curity problems. The common theme running through our work is the demonstration that we can detect attacks missed by other security tools as well as detecting attacks sooner than other security responses. To illustrate this, we describe the development of BayeShield, a browser-based tool capable of successfully identifying phishing at- tacks in the wild. Progressing from specific to a more general approach, we next focus on the detection of obfuscated scripts (one of the most commonly used tools in web-based attacks). Finally, we present TopSpector, a system we've designed to forecast malicious activity prior to it's occurrence. We demonstrate that by mining Top-Level DNS data we can produce a candidate set of domains that contains up to 65% of domains that will be blacklisted. Furthermore, on average TopSpector flags malicious domains 32 days before they are blacklisted, allowing the security community ample time to investigate these domains before they host malicious activity. |
author2 |
Jung, Eunjin |
author_facet |
Jung, Eunjin Likarish, Peter F. |
author |
Likarish, Peter F. |
author_sort |
Likarish, Peter F. |
title |
Early detection of malicious web content with applied machine learning |
title_short |
Early detection of malicious web content with applied machine learning |
title_full |
Early detection of malicious web content with applied machine learning |
title_fullStr |
Early detection of malicious web content with applied machine learning |
title_full_unstemmed |
Early detection of malicious web content with applied machine learning |
title_sort |
early detection of malicious web content with applied machine learning |
publisher |
University of Iowa |
publishDate |
2011 |
url |
https://ir.uiowa.edu/etd/4871 https://ir.uiowa.edu/cgi/viewcontent.cgi?article=4912&context=etd |
work_keys_str_mv |
AT likarishpeterf earlydetectionofmaliciouswebcontentwithappliedmachinelearning |
_version_ |
1719265638598311936 |