Using a Subtractive Center Behavioral Model to Detect Malware
In recent years, malware has evolved by using different obfuscation techniques; due to this evolution, the detection of malware has become problematic. Signature-based and traditional behavior-based malware detectors cannot effectively detect this new generation of malware. This paper proposes a sub...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Hindawi-Wiley
2020-01-01
|
Series: | Security and Communication Networks |
Online Access: | http://dx.doi.org/10.1155/2020/7501894 |
Summary: | In recent years, malware has evolved by using different obfuscation techniques; due to this evolution, the detection of malware has become problematic. Signature-based and traditional behavior-based malware detectors cannot effectively detect this new generation of malware. This paper proposes a subtractive center behavior model (SCBM) to create a malware dataset that captures semantically related behaviors from sample programs. In the proposed model, system paths, where malware behaviors are performed, and malware behaviors themselves are taken into consideration. This way malicious behavior patterns are differentiated from benign behavior patterns. Features that could not exceed the specified score are removed from the dataset. The datasets created using the proposed model contain far fewer features than the datasets created by n-gram and other models that have been used in other studies. The proposed model can handle both known and unknown malware, and the obtained detection rate and accuracy of the proposed model are higher than those of the known models. To show the effectiveness of the proposed model, 2 datasets with score and without score are created by using SCBM. In total, 6700 malware samples and 3000 benign samples are tested. The results are compared with those derived from n-gram and models from other studies in the literature. The test results show that, by combining the proposed model with an appropriate machine learning algorithm, the detection rate, false positive rate, and accuracy are measured as 99.9%, 0.2%, and 99.8%, respectively. |
---|---|
ISSN: | 1939-0114 1939-0122 |