Method for investigating computer incidents based on attribute clustering

A reduction of the amount of stored and processed information is an important task for internal audit. It is required to select groups of informational objects with similar parameters and to analyze them separately. Optimal clustering of the data is a suitable method to solve this problem. This pape...

Full description

Bibliographic Details
Main Authors: Igor S. Pantiukhin, Nikita K. Druzhinin, Lev S. Titov, Alexandr A. Kapitonov, Alisa A. Vorobeva
Format: Article
Language:English
Published: Moscow Engineering Physics Institute 2018-09-01
Series:Bezopasnostʹ Informacionnyh Tehnologij
Subjects:
Online Access:https://bit.mephi.ru/index.php/bit/article/view/1138
Description
Summary:A reduction of the amount of stored and processed information is an important task for internal audit. It is required to select groups of informational objects with similar parameters and to analyze them separately. Optimal clustering of the data is a suitable method to solve this problem. This paper presents a method of files grouping on the hard disk, based on the Lance Williams algorithm of hierarchical clustering. Files with the same computer incident will belong to the same cluster. This statement is based on the assumption that the user has performed series of actions interrelated in time or in another external attribute or a group of attributes (for example, scanning a row of images in succession, compiling and then sending an email) on the device under investigation. As a result of clustering, these data are grouped together into one cluster and further on they can be presented to a computer forensic scientist as a potential computer incident. Thus, there is no need to analyze the files itself, since the external file attributes such as creation time, access time, time of change, etc. are used as the meaningful parameters. This method also helps to specify the number of clusters manually for a rather flexible investigation of the tested file system. Experiment was carried on in order to test the presented method. The results of the experiment show that the files created and scanned within the same time interval ended up in the same cluster for both large and small number of the output data in the cluster.
ISSN:2074-7128
2074-7136