Data Analysis of Minimally-Structured Heterogeneous Logs : An experimental study of log template extraction and anomaly detection based on Recurrent Neural Network and Naive Bayes.

Nowadays, the ideas of continuous integration and continuous delivery are under heavy usage in order to achieve rapid software development speed and quick product delivery to the customers with good quality. During the process ofmodern software development, the testing stage has always been with gre...

Full description

Bibliographic Details
Main Author: Liu, Chang
Format: Others
Language:English
Published: KTH, Skolan för datavetenskap och kommunikation (CSC) 2016
Subjects:
RNN
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-191334
id ndltd-UPSALLA1-oai-DiVA.org-kth-191334
record_format oai_dc
spelling ndltd-UPSALLA1-oai-DiVA.org-kth-1913342016-08-31T05:08:23ZData Analysis of Minimally-Structured Heterogeneous Logs : An experimental study of log template extraction and anomaly detection based on Recurrent Neural Network and Naive Bayes.engLiu, ChangKTH, Skolan för datavetenskap och kommunikation (CSC)2016Data analysisLog analysisRNNNaive BayesNowadays, the ideas of continuous integration and continuous delivery are under heavy usage in order to achieve rapid software development speed and quick product delivery to the customers with good quality. During the process ofmodern software development, the testing stage has always been with great significance so that the delivered software is meeting all the requirements and with high quality, maintainability, sustainability, scalability, etc. The key assignment of software testing is to find bugs from every test and solve them. The developers and test engineers at Ericsson, who are working on a large scale software architecture, are mainly relying on the logs generated during the testing, which contains important information regarding the system behavior and software status, to debug the software. However, the volume of the data is too big and the variety is too complex and unpredictable, therefore, it is very time consuming and with great efforts for them to manually locate and resolve the bugs from such vast amount of log data. The objective of this thesis project is to explore a way to conduct log analysis efficiently and effectively by applying relevant machine learning algorithms in order to help people quickly detect the test failure and its possible causalities. In this project, a method of preprocessing and clusering original logs is designed and implemented in order to obtain useful data which can be fed to machine learning algorithms. The comparable log analysis, based on two machine learning algorithms - Recurrent Neural Network and Naive Bayes, is conducted for detecting the place of system failures and anomalies. Finally, relevant experimental results are provided and analyzed. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-191334application/pdfinfo:eu-repo/semantics/openAccess
collection NDLTD
language English
format Others
sources NDLTD
topic Data analysis
Log analysis
RNN
Naive Bayes
spellingShingle Data analysis
Log analysis
RNN
Naive Bayes
Liu, Chang
Data Analysis of Minimally-Structured Heterogeneous Logs : An experimental study of log template extraction and anomaly detection based on Recurrent Neural Network and Naive Bayes.
description Nowadays, the ideas of continuous integration and continuous delivery are under heavy usage in order to achieve rapid software development speed and quick product delivery to the customers with good quality. During the process ofmodern software development, the testing stage has always been with great significance so that the delivered software is meeting all the requirements and with high quality, maintainability, sustainability, scalability, etc. The key assignment of software testing is to find bugs from every test and solve them. The developers and test engineers at Ericsson, who are working on a large scale software architecture, are mainly relying on the logs generated during the testing, which contains important information regarding the system behavior and software status, to debug the software. However, the volume of the data is too big and the variety is too complex and unpredictable, therefore, it is very time consuming and with great efforts for them to manually locate and resolve the bugs from such vast amount of log data. The objective of this thesis project is to explore a way to conduct log analysis efficiently and effectively by applying relevant machine learning algorithms in order to help people quickly detect the test failure and its possible causalities. In this project, a method of preprocessing and clusering original logs is designed and implemented in order to obtain useful data which can be fed to machine learning algorithms. The comparable log analysis, based on two machine learning algorithms - Recurrent Neural Network and Naive Bayes, is conducted for detecting the place of system failures and anomalies. Finally, relevant experimental results are provided and analyzed.
author Liu, Chang
author_facet Liu, Chang
author_sort Liu, Chang
title Data Analysis of Minimally-Structured Heterogeneous Logs : An experimental study of log template extraction and anomaly detection based on Recurrent Neural Network and Naive Bayes.
title_short Data Analysis of Minimally-Structured Heterogeneous Logs : An experimental study of log template extraction and anomaly detection based on Recurrent Neural Network and Naive Bayes.
title_full Data Analysis of Minimally-Structured Heterogeneous Logs : An experimental study of log template extraction and anomaly detection based on Recurrent Neural Network and Naive Bayes.
title_fullStr Data Analysis of Minimally-Structured Heterogeneous Logs : An experimental study of log template extraction and anomaly detection based on Recurrent Neural Network and Naive Bayes.
title_full_unstemmed Data Analysis of Minimally-Structured Heterogeneous Logs : An experimental study of log template extraction and anomaly detection based on Recurrent Neural Network and Naive Bayes.
title_sort data analysis of minimally-structured heterogeneous logs : an experimental study of log template extraction and anomaly detection based on recurrent neural network and naive bayes.
publisher KTH, Skolan för datavetenskap och kommunikation (CSC)
publishDate 2016
url http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-191334
work_keys_str_mv AT liuchang dataanalysisofminimallystructuredheterogeneouslogsanexperimentalstudyoflogtemplateextractionandanomalydetectionbasedonrecurrentneuralnetworkandnaivebayes
_version_ 1718381158435127296