Using Information Retrieval Approach for Malware Classification
碩士 === 國立成功大學 === 電腦與通信工程研究所 === 102 === In this paper, we propose an Information Retrieval (IR) approach to classify malware samples into known malware families. First, each training sample will be sent into a dynamic analyzer tool – cuckoo sandbox to obtain the information of API (Application Prog...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2014
|
Online Access: | http://ndltd.ncl.edu.tw/handle/22111470442079737846 |
id |
ndltd-TW-102NCKU5652039 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-102NCKU56520392016-03-07T04:10:57Z http://ndltd.ncl.edu.tw/handle/22111470442079737846 Using Information Retrieval Approach for Malware Classification 利用資訊檢索方式於惡意程式分類之研究 Tzung-ShianTsai 蔡宗憲 碩士 國立成功大學 電腦與通信工程研究所 102 In this paper, we propose an Information Retrieval (IR) approach to classify malware samples into known malware families. First, each training sample will be sent into a dynamic analyzer tool – cuckoo sandbox to obtain the information of API (Application Programming Interface) calls which are called by sample. Every system call consists of three parts: function name, parameter name and parameter value. At the retrieval phase, perform the same procedure with the testing sample. Then, use TF-IDF (Term Frequency-Inverse Document Frequency) algorithm to model the test sample and all training samples as vector representation based on the API call information. This vector describes the behavioral characteristics of malware and is used to compare the similarity of behavior. Finally, find the malware category by retrieving the most similar family to achieve the purpose of malware classification. Chu-Sing Yang 楊竹星 2014 學位論文 ; thesis 39 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立成功大學 === 電腦與通信工程研究所 === 102 === In this paper, we propose an Information Retrieval (IR) approach to classify malware samples into known malware families.
First, each training sample will be sent into a dynamic analyzer tool – cuckoo sandbox to obtain the information of API (Application Programming Interface) calls which are called by sample. Every system call consists of three parts: function name, parameter name and parameter value. At the retrieval phase, perform the same procedure with the testing sample. Then, use TF-IDF (Term Frequency-Inverse Document Frequency) algorithm to model the test sample and all training samples as vector representation based on the API call information. This vector describes the behavioral characteristics of malware and is used to compare the similarity of behavior. Finally, find the malware category by retrieving the most similar family to achieve the purpose of malware classification.
|
author2 |
Chu-Sing Yang |
author_facet |
Chu-Sing Yang Tzung-ShianTsai 蔡宗憲 |
author |
Tzung-ShianTsai 蔡宗憲 |
spellingShingle |
Tzung-ShianTsai 蔡宗憲 Using Information Retrieval Approach for Malware Classification |
author_sort |
Tzung-ShianTsai |
title |
Using Information Retrieval Approach for Malware Classification |
title_short |
Using Information Retrieval Approach for Malware Classification |
title_full |
Using Information Retrieval Approach for Malware Classification |
title_fullStr |
Using Information Retrieval Approach for Malware Classification |
title_full_unstemmed |
Using Information Retrieval Approach for Malware Classification |
title_sort |
using information retrieval approach for malware classification |
publishDate |
2014 |
url |
http://ndltd.ncl.edu.tw/handle/22111470442079737846 |
work_keys_str_mv |
AT tzungshiantsai usinginformationretrievalapproachformalwareclassification AT càizōngxiàn usinginformationretrievalapproachformalwareclassification AT tzungshiantsai lìyòngzīxùnjiǎnsuǒfāngshìyúèyìchéngshìfēnlèizhīyánjiū AT càizōngxiàn lìyòngzīxùnjiǎnsuǒfāngshìyúèyìchéngshìfēnlèizhīyánjiū |
_version_ |
1718199712248496128 |