A learning model to detect maliciousness of portable executable using integrated feature set

Malware is one of the top most obstructions for expansion and growth of digital acceptance among the users. Both enterprises and common users are struggling to get protected from the malware in the cyberspace, which emphasizes the importance of developing efficient methods of malware detection. In t...

Full description

Bibliographic Details
Main Authors:	Ajit Kumar, K.S. Kuppusamy, G. Aghila
Format:	Article
Language:	English
Published:	Elsevier 2019-04-01
Series:	Journal of King Saud University: Computer and Information Sciences
Online Access:	http://www.sciencedirect.com/science/article/pii/S1319157817300149

id	doaj-0a51981961f54f339a70a5891aeb58c1
record_format	Article
spelling	doaj-0a51981961f54f339a70a5891aeb58c12020-11-24T21:00:33ZengElsevierJournal of King Saud University: Computer and Information Sciences1319-15782019-04-01312252265A learning model to detect maliciousness of portable executable using integrated feature setAjit Kumar0K.S. Kuppusamy1G. Aghila2Department of Computer Science, Pondicherry University, Pondicherry 605014, IndiaDepartment of Computer Science, Pondicherry University, Pondicherry 605014, India; Corresponding author.Department of Computer Science and Engineering, National Institute of Technology Puducherry, Karaikal 609605, IndiaMalware is one of the top most obstructions for expansion and growth of digital acceptance among the users. Both enterprises and common users are struggling to get protected from the malware in the cyberspace, which emphasizes the importance of developing efficient methods of malware detection. In this work, we propose a machine learning based solution to classify a sample as benign or malware with high accuracy and low computation overhead. An integrated feature set has been amalgamated as a combination of portable executable header fields raw value and derived values. Various machine-learning algorithms such as Decision Tree, Random Forest, kNN, Logistic Regression, Linear Discriminant Analysis and Naive Bayes were adopted in the classification of malware. Using existing raw feature set and the proposed integrated feature set we compared performance of each classifier. The empirical evidence indicates 98.4% classification accuracy in the 10-fold cross validation for the proposed integrated feature set. In the experiments conducted on the novel test data set the accuracy was observed as 89.23% for the integrated feature set which is 15% improvement on accuracy achieved with raw-feature set alone. Classification accuracy with only top N features (N = 5, 10, 15, 20, 25) are also experimented and it was observed that with only top 15 features 98% and 97% accuracy can be achieved on integrated and raw feature respectively. Keywords: Malware, Portable executable, Machine learning, Integrated featureshttp://www.sciencedirect.com/science/article/pii/S1319157817300149
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Ajit Kumar K.S. Kuppusamy G. Aghila
spellingShingle	Ajit Kumar K.S. Kuppusamy G. Aghila A learning model to detect maliciousness of portable executable using integrated feature set Journal of King Saud University: Computer and Information Sciences
author_facet	Ajit Kumar K.S. Kuppusamy G. Aghila
author_sort	Ajit Kumar
title	A learning model to detect maliciousness of portable executable using integrated feature set
title_short	A learning model to detect maliciousness of portable executable using integrated feature set
title_full	A learning model to detect maliciousness of portable executable using integrated feature set
title_fullStr	A learning model to detect maliciousness of portable executable using integrated feature set
title_full_unstemmed	A learning model to detect maliciousness of portable executable using integrated feature set
title_sort	learning model to detect maliciousness of portable executable using integrated feature set
publisher	Elsevier
series	Journal of King Saud University: Computer and Information Sciences
issn	1319-1578
publishDate	2019-04-01
description	Malware is one of the top most obstructions for expansion and growth of digital acceptance among the users. Both enterprises and common users are struggling to get protected from the malware in the cyberspace, which emphasizes the importance of developing efficient methods of malware detection. In this work, we propose a machine learning based solution to classify a sample as benign or malware with high accuracy and low computation overhead. An integrated feature set has been amalgamated as a combination of portable executable header fields raw value and derived values. Various machine-learning algorithms such as Decision Tree, Random Forest, kNN, Logistic Regression, Linear Discriminant Analysis and Naive Bayes were adopted in the classification of malware. Using existing raw feature set and the proposed integrated feature set we compared performance of each classifier. The empirical evidence indicates 98.4% classification accuracy in the 10-fold cross validation for the proposed integrated feature set. In the experiments conducted on the novel test data set the accuracy was observed as 89.23% for the integrated feature set which is 15% improvement on accuracy achieved with raw-feature set alone. Classification accuracy with only top N features (N = 5, 10, 15, 20, 25) are also experimented and it was observed that with only top 15 features 98% and 97% accuracy can be achieved on integrated and raw feature respectively. Keywords: Malware, Portable executable, Machine learning, Integrated features
url	http://www.sciencedirect.com/science/article/pii/S1319157817300149
work_keys_str_mv	AT ajitkumar alearningmodeltodetectmaliciousnessofportableexecutableusingintegratedfeatureset AT kskuppusamy alearningmodeltodetectmaliciousnessofportableexecutableusingintegratedfeatureset AT gaghila alearningmodeltodetectmaliciousnessofportableexecutableusingintegratedfeatureset AT ajitkumar learningmodeltodetectmaliciousnessofportableexecutableusingintegratedfeatureset AT kskuppusamy learningmodeltodetectmaliciousnessofportableexecutableusingintegratedfeatureset AT gaghila learningmodeltodetectmaliciousnessofportableexecutableusingintegratedfeatureset
_version_	1716779429838979072

A learning model to detect maliciousness of portable executable using integrated feature set

Similar Items