Evaluation of Supervised Machine LearningAlgorithms for Detecting Anomalies in Vehicle’s Off-Board Sensor Data
A diesel particulate filter (DPF) is designed to physically remove diesel particulate matter or soot from the exhaust gas of a diesel engine. Frequently replacing DPF is a waste of resource and waiting for full utilization is risky and very costly, so, what is the optimal time/milage to change DPF?...
Main Author: | |
---|---|
Format: | Others |
Language: | English |
Published: |
Högskolan Dalarna, Mikrodataanalys
2018
|
Subjects: | |
Online Access: | http://urn.kb.se/resolve?urn=urn:nbn:se:du-28962 |
id |
ndltd-UPSALLA1-oai-DiVA.org-du-28962 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-UPSALLA1-oai-DiVA.org-du-289622018-11-30T05:34:05ZEvaluation of Supervised Machine LearningAlgorithms for Detecting Anomalies in Vehicle’s Off-Board Sensor DataengWahab, Nor-UlHögskolan Dalarna, Mikrodataanalys2018Anomaly detectionrule-basedone class support vector machinek-nearest neighborrandom forestconfusion matrixaccuracyprecisionrecallF1-scoreSocial Sciences InterdisciplinaryTvärvetenskapliga studier inom samhällsvetenskapA diesel particulate filter (DPF) is designed to physically remove diesel particulate matter or soot from the exhaust gas of a diesel engine. Frequently replacing DPF is a waste of resource and waiting for full utilization is risky and very costly, so, what is the optimal time/milage to change DPF? Answering this question is very difficult without knowing when the DPF is changed in a vehicle. We are finding the answer with supervised machine learning algorithms for detecting anomalies in vehicles off-board sensor data (operational data of vehicles). Filter change is considered an anomaly because it is rare as compared to normal data. Non-sequential machine learning algorithms for anomaly detection like oneclass support vector machine (OC-SVM), k-nearest neighbor (K-NN), and random forest (RF) are applied for the first time on DPF dataset. The dataset is unbalanced, and accuracy is found misleading as a performance measure for the algorithms. Precision, recall, and F1-score are found good measure for the performance of the machine learning algorithms when the data is unbalanced. RF gave highest F1-score of 0.55 than K-NN (0.52) and OCSVM (0.51). It means that RF perform better than K-NN and OC-SVM but after further investigation it is concluded that the results are not satisfactory. However, a sequential approach should have been tried which could yield better result. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:du-28962application/pdfinfo:eu-repo/semantics/openAccess |
collection |
NDLTD |
language |
English |
format |
Others
|
sources |
NDLTD |
topic |
Anomaly detection rule-based one class support vector machine k-nearest neighbor random forest confusion matrix accuracy precision recall F1-score Social Sciences Interdisciplinary Tvärvetenskapliga studier inom samhällsvetenskap |
spellingShingle |
Anomaly detection rule-based one class support vector machine k-nearest neighbor random forest confusion matrix accuracy precision recall F1-score Social Sciences Interdisciplinary Tvärvetenskapliga studier inom samhällsvetenskap Wahab, Nor-Ul Evaluation of Supervised Machine LearningAlgorithms for Detecting Anomalies in Vehicle’s Off-Board Sensor Data |
description |
A diesel particulate filter (DPF) is designed to physically remove diesel particulate matter or soot from the exhaust gas of a diesel engine. Frequently replacing DPF is a waste of resource and waiting for full utilization is risky and very costly, so, what is the optimal time/milage to change DPF? Answering this question is very difficult without knowing when the DPF is changed in a vehicle. We are finding the answer with supervised machine learning algorithms for detecting anomalies in vehicles off-board sensor data (operational data of vehicles). Filter change is considered an anomaly because it is rare as compared to normal data. Non-sequential machine learning algorithms for anomaly detection like oneclass support vector machine (OC-SVM), k-nearest neighbor (K-NN), and random forest (RF) are applied for the first time on DPF dataset. The dataset is unbalanced, and accuracy is found misleading as a performance measure for the algorithms. Precision, recall, and F1-score are found good measure for the performance of the machine learning algorithms when the data is unbalanced. RF gave highest F1-score of 0.55 than K-NN (0.52) and OCSVM (0.51). It means that RF perform better than K-NN and OC-SVM but after further investigation it is concluded that the results are not satisfactory. However, a sequential approach should have been tried which could yield better result. |
author |
Wahab, Nor-Ul |
author_facet |
Wahab, Nor-Ul |
author_sort |
Wahab, Nor-Ul |
title |
Evaluation of Supervised Machine LearningAlgorithms for Detecting Anomalies in Vehicle’s Off-Board Sensor Data |
title_short |
Evaluation of Supervised Machine LearningAlgorithms for Detecting Anomalies in Vehicle’s Off-Board Sensor Data |
title_full |
Evaluation of Supervised Machine LearningAlgorithms for Detecting Anomalies in Vehicle’s Off-Board Sensor Data |
title_fullStr |
Evaluation of Supervised Machine LearningAlgorithms for Detecting Anomalies in Vehicle’s Off-Board Sensor Data |
title_full_unstemmed |
Evaluation of Supervised Machine LearningAlgorithms for Detecting Anomalies in Vehicle’s Off-Board Sensor Data |
title_sort |
evaluation of supervised machine learningalgorithms for detecting anomalies in vehicle’s off-board sensor data |
publisher |
Högskolan Dalarna, Mikrodataanalys |
publishDate |
2018 |
url |
http://urn.kb.se/resolve?urn=urn:nbn:se:du-28962 |
work_keys_str_mv |
AT wahabnorul evaluationofsupervisedmachinelearningalgorithmsfordetectinganomaliesinvehiclesoffboardsensordata |
_version_ |
1718799197472292864 |