Summary: | As the software industry grows larger by the minute, the need for automated solutions within bug report management is on the rise. Although some research has been conducted in the area of bug handling, new, faster or more precise approaches are yet to be developed. A bug report typically contains a free text observations field where the issue can be described by a human. Research regarding processing of this type of field is extensive, however, bug reports are often accompanied with system log files which have been given less attention so far. In the 4G LTE telecommunications network, the available system log files are many and several are likely to aid the routing of bug reports. In this thesis, one system log file was chosen to be evaluated; the alarm log. The alarm logs are time series count data containing alarms raised by the system. The alarm log data have been pre-processed with data mining techniques. The Apriori algorithm has been used to mine for specific alarms and alarming objects which indicates that the bug report should be solved by a particular developer group. We extend the Apriori algorithm to a temporal setting by using a customised time dependent confidence measure. To further mine for interesting sequences of events in the logs, the sequence mining approach SPADE has been used. The extracted class-associated sequences from both pre-processing approaches are transformed into binary features possible to use as predictors in any prediction model. The results have been evaluated by predicting the correct developer group with two different methods; logistic regression and DO-probit. Logistic regression was regularised with the elastic net penalty to avoid computational issues as well as handling the sparse covariate set. DO-probit was used with a horseshoe prior; it is well suited for the sparse covariate regression problem as it is customised to obtain signals in sparse, noisy data. The results indicate that a data mining approach for processing alarm logs is promising. The results show that the rules obtained with the Apriori mining process are suitable for mining the alarm logs as most binary representations of the rules used as covariates in logistic regression are kept in the equations for the expected classes with strongly positive coefficients. Although, the overall improvement in accuracy from using the alarms logs in addition to the learned topics from free text fields is modest, the alarm logs are concluded to be a good complement to the free text information as some Apriori covariates appears to be better suited to predict some classes than some topics.
|