Using Information Retrieval Approach for Malware Classification

碩士 === 國立成功大學 === 電腦與通信工程研究所 === 102 === In this paper, we propose an Information Retrieval (IR) approach to classify malware samples into known malware families. First, each training sample will be sent into a dynamic analyzer tool – cuckoo sandbox to obtain the information of API (Application Prog...

Full description

Bibliographic Details
Main Authors: Tzung-ShianTsai, 蔡宗憲
Other Authors: Chu-Sing Yang
Format: Others
Language:zh-TW
Published: 2014
Online Access:http://ndltd.ncl.edu.tw/handle/22111470442079737846
id ndltd-TW-102NCKU5652039
record_format oai_dc
spelling ndltd-TW-102NCKU56520392016-03-07T04:10:57Z http://ndltd.ncl.edu.tw/handle/22111470442079737846 Using Information Retrieval Approach for Malware Classification 利用資訊檢索方式於惡意程式分類之研究 Tzung-ShianTsai 蔡宗憲 碩士 國立成功大學 電腦與通信工程研究所 102 In this paper, we propose an Information Retrieval (IR) approach to classify malware samples into known malware families. First, each training sample will be sent into a dynamic analyzer tool – cuckoo sandbox to obtain the information of API (Application Programming Interface) calls which are called by sample. Every system call consists of three parts: function name, parameter name and parameter value. At the retrieval phase, perform the same procedure with the testing sample. Then, use TF-IDF (Term Frequency-Inverse Document Frequency) algorithm to model the test sample and all training samples as vector representation based on the API call information. This vector describes the behavioral characteristics of malware and is used to compare the similarity of behavior. Finally, find the malware category by retrieving the most similar family to achieve the purpose of malware classification. Chu-Sing Yang 楊竹星 2014 學位論文 ; thesis 39 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立成功大學 === 電腦與通信工程研究所 === 102 === In this paper, we propose an Information Retrieval (IR) approach to classify malware samples into known malware families. First, each training sample will be sent into a dynamic analyzer tool – cuckoo sandbox to obtain the information of API (Application Programming Interface) calls which are called by sample. Every system call consists of three parts: function name, parameter name and parameter value. At the retrieval phase, perform the same procedure with the testing sample. Then, use TF-IDF (Term Frequency-Inverse Document Frequency) algorithm to model the test sample and all training samples as vector representation based on the API call information. This vector describes the behavioral characteristics of malware and is used to compare the similarity of behavior. Finally, find the malware category by retrieving the most similar family to achieve the purpose of malware classification.
author2 Chu-Sing Yang
author_facet Chu-Sing Yang
Tzung-ShianTsai
蔡宗憲
author Tzung-ShianTsai
蔡宗憲
spellingShingle Tzung-ShianTsai
蔡宗憲
Using Information Retrieval Approach for Malware Classification
author_sort Tzung-ShianTsai
title Using Information Retrieval Approach for Malware Classification
title_short Using Information Retrieval Approach for Malware Classification
title_full Using Information Retrieval Approach for Malware Classification
title_fullStr Using Information Retrieval Approach for Malware Classification
title_full_unstemmed Using Information Retrieval Approach for Malware Classification
title_sort using information retrieval approach for malware classification
publishDate 2014
url http://ndltd.ncl.edu.tw/handle/22111470442079737846
work_keys_str_mv AT tzungshiantsai usinginformationretrievalapproachformalwareclassification
AT càizōngxiàn usinginformationretrievalapproachformalwareclassification
AT tzungshiantsai lìyòngzīxùnjiǎnsuǒfāngshìyúèyìchéngshìfēnlèizhīyánjiū
AT càizōngxiàn lìyòngzīxùnjiǎnsuǒfāngshìyúèyìchéngshìfēnlèizhīyánjiū
_version_ 1718199712248496128