Summary: | 碩士 === 國立臺灣科技大學 === 資訊工程系 === 104 === An anomaly, or outlier, is something that is different from the rest. These differences may ultimately correspond to an object or event of interest, the detection of which often proves to be of great importance or interest. For example fraud, spam, and device malfunctions correspond to events which need to be noticed and to do so we characterize them by their deviation from normality. By automating the creation of a ranking or list of what is most deviant, we can save time and decrease the cognitive overload of the individuals or groups responsible for responding to such events.
Over the years many anomaly and outlier metrics and detection methods have been developed for the purpose of finding data incongruencies. In this thesis we review the general strategies and measures used to characterize the `strangeness' of data, as well as how these separate methods may be combined. Under the assumption that ``the crowd is wise'', we adopt an eclectic approach and propose a clustering-based score ensembling method for outlier detection. Using benchmark datasets we evaluate quantitatively the robustness and accuracy of different ensemble strategies. We find that ensembling strategies offer only limited value for increasing overall performance, but provide robustness and protection from underperforming models. We also discuss the use of randomization to create ensemble-based methods. Based on our results we conclude that, given the current state-of-the-art, unsupervised anomaly detection faces significant challenges.
|