Summary: | Machine learning techniques generally require or assume balanced datasets. Skewed data can make machine learning systems never function properly, no matter how carefully the parameter tuning is conducted. Thus, a common solution to the problem of high skewness is to pre-process data (e.g., log transformation) before applying machine learning to deal with real-world problems. Nevertheless, this pre-processing strategy cannot be employed for online machine learning, especially in the context of edge computing, because it is barely possible to foresee and store the continuous data flow on IoT devices on the edge. Thus, it will be crucial and valuable to enable skewness monitoring in real time. Unfortunately, there exists a surprising gap between practitioners’ needs and scientific research in running statistics for monitoring real-time skewness, not to mention the lack of suitable remedies for skewed data at runtime. Inspired by Welford’s algorithm, which is the most efficient approach to calculating running variance, this research developed efficient calculation methods for three versions of running skewness. These methods can conveniently be implemented as skewness monitoring modules that are affordable for IoT devices in different edge learning scenarios. Such an IoT-friendly skewness monitoring eventually acts a cornerstone for developing the research field of skewness-aware online edge learning. By initially validating the usefulness and significance of skewness awareness in edge learning implementations, we also argue that conjoint research efforts from relevant communities are needed to boost this promising research field.
|