Initial Data Selection for Imbalanced Active Learning with Local Density

碩士 === 國立高雄師範大學 === 數學系 === 107 === To construct a classification system or a detection system, large amounts of labeled samples are needed. However, manual labeling is expensive and time consuming, so researchers have proposed the active learning technology. How to alleviate cost of labelling is th...

Full description

Bibliographic Details
Main Authors: LIN, SHANG-YU, 林上宇
Other Authors: YEH, YI-REN
Format: Others
Language:en_US
Published: 2019
Online Access:http://ndltd.ncl.edu.tw/handle/8eeyfa
Description
Summary:碩士 === 國立高雄師範大學 === 數學系 === 107 === To construct a classification system or a detection system, large amounts of labeled samples are needed. However, manual labeling is expensive and time consuming, so researchers have proposed the active learning technology. How to alleviate cost of labelling is the core task in active learning. With a limited query budget, designing selection strategies of discovering more informative instances is the key factor for the performance. Besides the selection strategies, the selection of initial training set is also a critical issue for improving the performance, especially for the imbalanced data. In this work, we address the problem of initial selection in active learning for imbalanced data. We use the local densities combine with Min-Max approach and generate Virtual Points to discovery key instances with no labelling information in the initialization of active learning. In our proposed methods, we can explore not only the majority class but also the minority class. In our experiments, we demonstrate that our proposed methods can achieve better results in UCI data sets. Experimental results show the effectiveness of our proposed methods by comparing with other initial selection strategies. Finally, we can see from the results that our proposed methods can establish a sufficiently accurate initial selection model. This provides a more systematic and efficient approach to many machine learning applications.