Summary: | The increased use of digital media to store legal, as well as illegal data, has created the need
for specialized tools that can monitor, control and even recover this data. An important task
in computer forensics and security is to identify the true le type to which a computer le
or computer le fragment belongs. File type identi cation is traditionally done by means
of metadata, such as le extensions and le header and footer signatures. As a result,
traditional metadata-based le object type identi cation techniques work well in cases where
the required metadata is available and unaltered. However, traditional approaches are not
reliable when the integrity of metadata is not guaranteed or metadata is unavailable. As
an alternative, any pattern in the content of a le object can be used to determine the
associated le type. This is called content-based le object type identi cation.
Supervised learning techniques can be used to infer a le object type classi er by exploiting
some unique pattern that underlies a le type's common le structure. This study builds
on existing literature regarding the use of supervised learning techniques for content-based
le object type identi cation, and explores the combined use of multilayer perceptron neural
network classi ers and linear programming-based discriminant classi ers as a solution to the
multiple class le fragment type identi cation problem.
The purpose of this study was to investigate and compare the use of a single multilayer
perceptron neural network classi er, a single linear programming-based discriminant classi-
er and a combined ensemble of these classi ers in the eld of le type identi cation. The
ability of each individual classi er and the ensemble of these classi ers to accurately predict
the le type to which a le fragment belongs were tested empirically.
The study found that both a multilayer perceptron neural network and a linear programming-
based discriminant classi er (used in a round robin) seemed to perform well in solving
the multiple class le fragment type identi cation problem. The results of combining
multilayer perceptron neural network classi ers and linear programming-based discriminant
classi ers in an ensemble were not better than those of the single optimized classi ers. === MSc (Computer Science), North-West University, Potchefstroom Campus, 2013
|