Ensemble Stream Model for Data-Cleaning in Sensor Networks

Ensemble Stream Modeling and Data-cleaning are sensor information processing systems have different training and testing methods by which their goals are cross-validated. This research examines a mechanism, which seeks to extract novel patterns by generating ensembles from data. The main goal of lab...

Full description

Bibliographic Details
Main Author: Iyer, Vasanth
Format: Others
Published: FIU Digital Commons 2013
Subjects:
Online Access:http://digitalcommons.fiu.edu/etd/973
http://digitalcommons.fiu.edu/cgi/viewcontent.cgi?article=2090&context=etd
id ndltd-fiu.edu-oai-digitalcommons.fiu.edu-etd-2090
record_format oai_dc
spelling ndltd-fiu.edu-oai-digitalcommons.fiu.edu-etd-20902018-07-19T03:33:21Z Ensemble Stream Model for Data-Cleaning in Sensor Networks Iyer, Vasanth Ensemble Stream Modeling and Data-cleaning are sensor information processing systems have different training and testing methods by which their goals are cross-validated. This research examines a mechanism, which seeks to extract novel patterns by generating ensembles from data. The main goal of label-less stream processing is to process the sensed events to eliminate the noises that are uncorrelated, and choose the most likely model without over fitting thus obtaining higher model confidence. Higher quality streams can be realized by combining many short streams into an ensemble which has the desired quality. The framework for the investigation is an existing data mining tool. First, to accommodate feature extraction such as a bush or natural forest-fire event we make an assumption of the burnt area (BA*), sensed ground truth as our target variable obtained from logs. Even though this is an obvious model choice the results are disappointing. The reasons for this are two: One, the histogram of fire activity is highly skewed. Two, the measured sensor parameters are highly correlated. Since using non descriptive features does not yield good results, we resort to temporal features. By doing so we carefully eliminate the averaging effects; the resulting histogram is more satisfactory and conceptual knowledge is learned from sensor streams. Second is the process of feature induction by cross-validating attributes with single or multi-target variables to minimize training error. We use F-measure score, which combines precision and accuracy to determine the false alarm rate of fire events. The multi-target data-cleaning trees use information purity of the target leaf-nodes to learn higher order features. A sensitive variance measure such as f-test is performed during each node’s split to select the best attribute. Ensemble stream model approach proved to improve when using complicated features with a simpler tree classifier. The ensemble framework for data-cleaning and the enhancements to quantify quality of fitness (30% spatial, 10% temporal, and 90% mobility reduction) of sensor led to the formation of streams for sensor-enabled applications. Which further motivates the novelty of stream quality labeling and its importance in solving vast amounts of real-time mobile streams generated today. 2013-10-16T07:00:00Z text application/pdf http://digitalcommons.fiu.edu/etd/973 http://digitalcommons.fiu.edu/cgi/viewcontent.cgi?article=2090&context=etd FIU Electronic Theses and Dissertations FIU Digital Commons Sensor Networks Mobile Sensor Networks Data-cleaning Machine Learning Data Mining Routing Power-aware routing Netcoding Data Aggregation Quality of Data Quality of Service Feature Extraction Randomforest Bagging Classifiers Renewable Energy
collection NDLTD
format Others
sources NDLTD
topic Sensor Networks
Mobile Sensor Networks
Data-cleaning
Machine Learning
Data Mining
Routing
Power-aware routing
Netcoding
Data Aggregation
Quality of Data
Quality of Service
Feature Extraction
Randomforest
Bagging
Classifiers
Renewable Energy
spellingShingle Sensor Networks
Mobile Sensor Networks
Data-cleaning
Machine Learning
Data Mining
Routing
Power-aware routing
Netcoding
Data Aggregation
Quality of Data
Quality of Service
Feature Extraction
Randomforest
Bagging
Classifiers
Renewable Energy
Iyer, Vasanth
Ensemble Stream Model for Data-Cleaning in Sensor Networks
description Ensemble Stream Modeling and Data-cleaning are sensor information processing systems have different training and testing methods by which their goals are cross-validated. This research examines a mechanism, which seeks to extract novel patterns by generating ensembles from data. The main goal of label-less stream processing is to process the sensed events to eliminate the noises that are uncorrelated, and choose the most likely model without over fitting thus obtaining higher model confidence. Higher quality streams can be realized by combining many short streams into an ensemble which has the desired quality. The framework for the investigation is an existing data mining tool. First, to accommodate feature extraction such as a bush or natural forest-fire event we make an assumption of the burnt area (BA*), sensed ground truth as our target variable obtained from logs. Even though this is an obvious model choice the results are disappointing. The reasons for this are two: One, the histogram of fire activity is highly skewed. Two, the measured sensor parameters are highly correlated. Since using non descriptive features does not yield good results, we resort to temporal features. By doing so we carefully eliminate the averaging effects; the resulting histogram is more satisfactory and conceptual knowledge is learned from sensor streams. Second is the process of feature induction by cross-validating attributes with single or multi-target variables to minimize training error. We use F-measure score, which combines precision and accuracy to determine the false alarm rate of fire events. The multi-target data-cleaning trees use information purity of the target leaf-nodes to learn higher order features. A sensitive variance measure such as f-test is performed during each node’s split to select the best attribute. Ensemble stream model approach proved to improve when using complicated features with a simpler tree classifier. The ensemble framework for data-cleaning and the enhancements to quantify quality of fitness (30% spatial, 10% temporal, and 90% mobility reduction) of sensor led to the formation of streams for sensor-enabled applications. Which further motivates the novelty of stream quality labeling and its importance in solving vast amounts of real-time mobile streams generated today.
author Iyer, Vasanth
author_facet Iyer, Vasanth
author_sort Iyer, Vasanth
title Ensemble Stream Model for Data-Cleaning in Sensor Networks
title_short Ensemble Stream Model for Data-Cleaning in Sensor Networks
title_full Ensemble Stream Model for Data-Cleaning in Sensor Networks
title_fullStr Ensemble Stream Model for Data-Cleaning in Sensor Networks
title_full_unstemmed Ensemble Stream Model for Data-Cleaning in Sensor Networks
title_sort ensemble stream model for data-cleaning in sensor networks
publisher FIU Digital Commons
publishDate 2013
url http://digitalcommons.fiu.edu/etd/973
http://digitalcommons.fiu.edu/cgi/viewcontent.cgi?article=2090&context=etd
work_keys_str_mv AT iyervasanth ensemblestreammodelfordatacleaninginsensornetworks
_version_ 1718712882626035712