MULTI-LAYER MODEL AND TRAINING METHOD FOR MALWARE TRAFFIC DEETECTION BASED ON DECISION TREE ENSEMBLE

The model and training method of multilayer feature extractor and decision rules for a malware traffic detector is proposed. The feature extractor model is based on a convolutional sparse coding network whose sparse encoder is approximated by a regression random forest model according to the princip...

Full description

Bibliographic Details
Main Authors:	В’ячеслав Васильович Москаленко, Микола Олександрович Зарецький, Альона Сергіївна Москаленко, Антон Михайлович Кудрявцев, Віктор Анатолійович Семашко
Format:	Article
Language:	English
Published:	National Aerospace University «Kharkiv Aviation Institute» 2020-04-01
Series:	Радіоелектронні і комп'ютерні системи
Subjects:	система детектування загроз згорткова розріджено кодуюча модель зростаючий нейронний газ ансамбль дерев рішень регресійний випадковий ліс інформаційний критерій дистиляція знань інформаційно-екстремальна машинне навчання
Online Access:	http://nti.khai.edu/ojs/index.php/reks/article/view/1124

id	doaj-01cfc8773f794587a25aab960dfdd14c
record_format	Article
spelling	doaj-01cfc8773f794587a25aab960dfdd14c2020-11-25T03:36:33ZengNational Aerospace University «Kharkiv Aviation Institute»Радіоелектронні і комп'ютерні системи1814-42252663-20122020-04-01029210110.32620/reks.2020.2.081143MULTI-LAYER MODEL AND TRAINING METHOD FOR MALWARE TRAFFIC DEETECTION BASED ON DECISION TREE ENSEMBLEВ’ячеслав Васильович Москаленко0Микола Олександрович Зарецький1Альона Сергіївна Москаленко2Антон Михайлович Кудрявцев3Віктор Анатолійович Семашко4Сумський державний університетСумський державний університетСумський державний університетСумський державний університетСумський державний університетThe model and training method of multilayer feature extractor and decision rules for a malware traffic detector is proposed. The feature extractor model is based on a convolutional sparse coding network whose sparse encoder is approximated by a regression random forest model according to the principles of knowledge distillation. In this case, an algorithm of growing sparse coding neural gas has been developed for unsupervised training the features extractor with automatic determination of the required number of features on each layer. As for feature extractor, at the training phase to implement of sparse coding the greedy L1-regularized method of Orthogonal Matching Pursuit was used, and at the knowledge distillation phase, the L1-regularized method at the least angles (Least regression algorithm) was additionally used. Due to the explaining-away effect, the extracted features are uncorrelated and robust to noise and adversarial attacks. The proposed feature extractor is unsupervised trained to separate the explanatory factors and allows to use the unlabeled training data, which are usually quite large, with the maximum efficiency. As a model of the decision rules proposed to use the binary encoder of input observations based on an ensemble of decision trees and information-extreme closed hyper-surfaces (containers) for class separation, that are recovery in radial-basis of Hemming' binary space. The addition of coding trees is based on the boosting principle, and the radius of class containers is optimized by direct search. The information-extreme classifier is characterized by low computational complexity and high generalization capacity for small sets of labeled training data. The verification results of the trained model on open CTU test data sets confirm the suitability of the proposed algorithms for practical application since the accuracy of malware traffic detection is 96.1 %.http://nti.khai.edu/ojs/index.php/reks/article/view/1124система детектування загроззгорткова розріджено кодуюча модельзростаючий нейронний газансамбль дерев рішеньрегресійний випадковий лісінформаційний критерійдистиляція знаньінформаційно-екстремальна машинне навчання
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	В’ячеслав Васильович Москаленко Микола Олександрович Зарецький Альона Сергіївна Москаленко Антон Михайлович Кудрявцев Віктор Анатолійович Семашко
spellingShingle	В’ячеслав Васильович Москаленко Микола Олександрович Зарецький Альона Сергіївна Москаленко Антон Михайлович Кудрявцев Віктор Анатолійович Семашко MULTI-LAYER MODEL AND TRAINING METHOD FOR MALWARE TRAFFIC DEETECTION BASED ON DECISION TREE ENSEMBLE Радіоелектронні і комп'ютерні системи система детектування загроз згорткова розріджено кодуюча модель зростаючий нейронний газ ансамбль дерев рішень регресійний випадковий ліс інформаційний критерій дистиляція знань інформаційно-екстремальна машинне навчання
author_facet	В’ячеслав Васильович Москаленко Микола Олександрович Зарецький Альона Сергіївна Москаленко Антон Михайлович Кудрявцев Віктор Анатолійович Семашко
author_sort	В’ячеслав Васильович Москаленко
title	MULTI-LAYER MODEL AND TRAINING METHOD FOR MALWARE TRAFFIC DEETECTION BASED ON DECISION TREE ENSEMBLE
title_short	MULTI-LAYER MODEL AND TRAINING METHOD FOR MALWARE TRAFFIC DEETECTION BASED ON DECISION TREE ENSEMBLE
title_full	MULTI-LAYER MODEL AND TRAINING METHOD FOR MALWARE TRAFFIC DEETECTION BASED ON DECISION TREE ENSEMBLE
title_fullStr	MULTI-LAYER MODEL AND TRAINING METHOD FOR MALWARE TRAFFIC DEETECTION BASED ON DECISION TREE ENSEMBLE
title_full_unstemmed	MULTI-LAYER MODEL AND TRAINING METHOD FOR MALWARE TRAFFIC DEETECTION BASED ON DECISION TREE ENSEMBLE
title_sort	multi-layer model and training method for malware traffic deetection based on decision tree ensemble
publisher	National Aerospace University «Kharkiv Aviation Institute»
series	Радіоелектронні і комп'ютерні системи
issn	1814-4225 2663-2012
publishDate	2020-04-01
description	The model and training method of multilayer feature extractor and decision rules for a malware traffic detector is proposed. The feature extractor model is based on a convolutional sparse coding network whose sparse encoder is approximated by a regression random forest model according to the principles of knowledge distillation. In this case, an algorithm of growing sparse coding neural gas has been developed for unsupervised training the features extractor with automatic determination of the required number of features on each layer. As for feature extractor, at the training phase to implement of sparse coding the greedy L1-regularized method of Orthogonal Matching Pursuit was used, and at the knowledge distillation phase, the L1-regularized method at the least angles (Least regression algorithm) was additionally used. Due to the explaining-away effect, the extracted features are uncorrelated and robust to noise and adversarial attacks. The proposed feature extractor is unsupervised trained to separate the explanatory factors and allows to use the unlabeled training data, which are usually quite large, with the maximum efficiency. As a model of the decision rules proposed to use the binary encoder of input observations based on an ensemble of decision trees and information-extreme closed hyper-surfaces (containers) for class separation, that are recovery in radial-basis of Hemming' binary space. The addition of coding trees is based on the boosting principle, and the radius of class containers is optimized by direct search. The information-extreme classifier is characterized by low computational complexity and high generalization capacity for small sets of labeled training data. The verification results of the trained model on open CTU test data sets confirm the suitability of the proposed algorithms for practical application since the accuracy of malware traffic detection is 96.1 %.
topic	система детектування загроз згорткова розріджено кодуюча модель зростаючий нейронний газ ансамбль дерев рішень регресійний випадковий ліс інформаційний критерій дистиляція знань інформаційно-екстремальна машинне навчання
url	http://nti.khai.edu/ojs/index.php/reks/article/view/1124
work_keys_str_mv	AT vâčeslavvasilʹovičmoskalenko multilayermodelandtrainingmethodformalwaretrafficdeetectionbasedondecisiontreeensemble AT mikolaoleksandrovičzarecʹkij multilayermodelandtrainingmethodformalwaretrafficdeetectionbasedondecisiontreeensemble AT alʹonasergíívnamoskalenko multilayermodelandtrainingmethodformalwaretrafficdeetectionbasedondecisiontreeensemble AT antonmihajlovičkudrâvcev multilayermodelandtrainingmethodformalwaretrafficdeetectionbasedondecisiontreeensemble AT víktoranatolíjovičsemaško multilayermodelandtrainingmethodformalwaretrafficdeetectionbasedondecisiontreeensemble
_version_	1724549420512444416

MULTI-LAYER MODEL AND TRAINING METHOD FOR MALWARE TRAFFIC DEETECTION BASED ON DECISION TREE ENSEMBLE

Similar Items