Autonomous Sensor Data Cleaning in Stream Mining Setting

Background: Internet of Things (IoT), earth observation and big scientific experiments are sources of extensive amounts of sensor big data today. We are faced with large amounts of data with low measurement costs. A standard approach in such cases is a stream mining approach, implying that we look a...

Full description

Bibliographic Details
Main Authors:	Kenda Klemen, Mladenić Dunja
Format:	Article
Language:	English
Published:	Sciendo 2018-07-01
Series:	Business Systems Research
Subjects:	big data autonomous processing real-world applications data cleaning stream mining water management data-centre management smart-grids
Online Access:	https://doi.org/10.2478/bsrj-2018-0020

id	doaj-55fa51e8573b451cbd1e9d02d601ac11
record_format	Article
spelling	doaj-55fa51e8573b451cbd1e9d02d601ac112021-09-05T21:00:36ZengSciendoBusiness Systems Research1847-93752018-07-0192697910.2478/bsrj-2018-0020Autonomous Sensor Data Cleaning in Stream Mining SettingKenda Klemen0Mladenić Dunja1Jožef Stefan Institute, Ljubljana, Slovenia, Jozef Stefan International Postgraduate School,Ljubljana, SloveniaJožef Stefan Institute, Ljubljana, Slovenia, Jozef Stefan International Postgraduate School,Ljubljana, SloveniaBackground: Internet of Things (IoT), earth observation and big scientific experiments are sources of extensive amounts of sensor big data today. We are faced with large amounts of data with low measurement costs. A standard approach in such cases is a stream mining approach, implying that we look at a particular measurement only once during the real-time processing. This requires the methods to be completely autonomous. In the past, very little attention was given to the most time-consuming part of the data mining process, i.e. data pre-processing. Objectives: In this paper we propose an algorithm for data cleaning, which can be applied to real-world streaming big data. Methods/Approach: We use the short-term prediction method based on the Kalman filter to detect admissible intervals for future measurements. The model can be adapted to the concept drift and is useful for detecting random additive outliers in a sensor data stream. Results: For datasets with low noise, our method has proven to perform better than the method currently commonly used in batch processing scenarios. Our results on higher noise datasets are comparable. Conclusions: We have demonstrated a successful application of the proposed method in real-world scenarios including the groundwater level, server load and smart-grid datahttps://doi.org/10.2478/bsrj-2018-0020big dataautonomous processingreal-world applicationsdata cleaningstream miningwater managementdata-centre managementsmart-grids
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Kenda Klemen Mladenić Dunja
spellingShingle	Kenda Klemen Mladenić Dunja Autonomous Sensor Data Cleaning in Stream Mining Setting Business Systems Research big data autonomous processing real-world applications data cleaning stream mining water management data-centre management smart-grids
author_facet	Kenda Klemen Mladenić Dunja
author_sort	Kenda Klemen
title	Autonomous Sensor Data Cleaning in Stream Mining Setting
title_short	Autonomous Sensor Data Cleaning in Stream Mining Setting
title_full	Autonomous Sensor Data Cleaning in Stream Mining Setting
title_fullStr	Autonomous Sensor Data Cleaning in Stream Mining Setting
title_full_unstemmed	Autonomous Sensor Data Cleaning in Stream Mining Setting
title_sort	autonomous sensor data cleaning in stream mining setting
publisher	Sciendo
series	Business Systems Research
issn	1847-9375
publishDate	2018-07-01
description	Background: Internet of Things (IoT), earth observation and big scientific experiments are sources of extensive amounts of sensor big data today. We are faced with large amounts of data with low measurement costs. A standard approach in such cases is a stream mining approach, implying that we look at a particular measurement only once during the real-time processing. This requires the methods to be completely autonomous. In the past, very little attention was given to the most time-consuming part of the data mining process, i.e. data pre-processing. Objectives: In this paper we propose an algorithm for data cleaning, which can be applied to real-world streaming big data. Methods/Approach: We use the short-term prediction method based on the Kalman filter to detect admissible intervals for future measurements. The model can be adapted to the concept drift and is useful for detecting random additive outliers in a sensor data stream. Results: For datasets with low noise, our method has proven to perform better than the method currently commonly used in batch processing scenarios. Our results on higher noise datasets are comparable. Conclusions: We have demonstrated a successful application of the proposed method in real-world scenarios including the groundwater level, server load and smart-grid data
topic	big data autonomous processing real-world applications data cleaning stream mining water management data-centre management smart-grids
url	https://doi.org/10.2478/bsrj-2018-0020
work_keys_str_mv	AT kendaklemen autonomoussensordatacleaninginstreamminingsetting AT mladenicdunja autonomoussensordatacleaninginstreamminingsetting
_version_	1717782597882347520

Autonomous Sensor Data Cleaning in Stream Mining Setting

Similar Items