STATISTICAL METHODS IN AI: RARE EVENT LEARNING USING ASSOCIATIVE RULES AND HIGHER-ORDER STATISTICS

Rare event learning has not been actively researched since lately due to the unavailability of algorithms which deal with big samples. The research addresses spatio-temporal streams from multi-resolution sensors to find actionable items from a perspective of real-time algorithms. This computing fram...

Full description

Bibliographic Details
Main Authors: V. Iyer, S. Shetty, S. S. Iyengar
Format: Article
Language:English
Published: Copernicus Publications 2015-07-01
Series:ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Online Access:http://www.isprs-ann-photogramm-remote-sens-spatial-inf-sci.net/II-4-W2/119/2015/isprsannals-II-4-W2-119-2015.pdf
Description
Summary:Rare event learning has not been actively researched since lately due to the unavailability of algorithms which deal with big samples. The research addresses spatio-temporal streams from multi-resolution sensors to find actionable items from a perspective of real-time algorithms. This computing framework is independent of the number of input samples, application domain, labelled or label-less streams. A sampling overlap algorithm such as Brooks-Iyengar is used for dealing with noisy sensor streams. We extend the existing noise pre-processing algorithms using Data-Cleaning trees. Pre-processing using ensemble of trees using bagging and multi-target regression showed robustness to random noise and missing data. As spatio-temporal streams are highly statistically correlated, we prove that a temporal window based sampling from sensor data streams converges after n samples using Hoeffding bounds. Which can be used for fast prediction of new samples in real-time. The Data-cleaning tree model uses a nonparametric node splitting technique, which can be learned in an iterative way which scales linearly in memory consumption for any size input stream. The improved task based ensemble extraction is compared with non-linear computation models using various SVM kernels for speed and accuracy. We show using empirical datasets the explicit rule learning computation is linear in time and is only dependent on the number of leafs present in the tree ensemble. The use of unpruned trees (<i>t</i>) in our proposed ensemble always yields minimum number (<i>m</i>) of leafs keeping pre-processing computation to <i>n</i> &times; <i>t</i> log <i>m</i> compared to <i>N<sup>2</sup></i> for Gram Matrix. We also show that the task based feature induction yields higher Qualify of Data (QoD) in the feature space compared to kernel methods using Gram Matrix.
ISSN:2194-9042
2194-9050