A Unified Approach on Active Learning Dual Supervision
碩士 === 國立臺灣科技大學 === 資訊工程系 === 105 === Active Learning is a machine learning framework that tries to solves the issue of having huge amount of unlabeled data compared to the labeled data. Most studies in Active Learning focus on how to select the unlabeled data to be labeled by a human oracle in orde...
Main Author: | |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2017
|
Online Access: | http://ndltd.ncl.edu.tw/handle/50414305702296531005 |
id |
ndltd-TW-105NTUS5392067 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-105NTUS53920672017-10-31T04:58:57Z http://ndltd.ncl.edu.tw/handle/50414305702296531005 A Unified Approach on Active Learning Dual Supervision A Unified Approach on Active Learning Dual Supervision Adrian Chriswanto Adrian Chriswanto 碩士 國立臺灣科技大學 資訊工程系 105 Active Learning is a machine learning framework that tries to solves the issue of having huge amount of unlabeled data compared to the labeled data. Most studies in Active Learning focus on how to select the unlabeled data to be labeled by a human oracle in order to maximize the performance gain of the model with as little labeling effort as possible. In this thesis, however, we focus not only in how to select the data instances but also how to select which features to be labeled by the oracle in a unified manner. By unified, it means that we tried to select the best possible combination of features and instances on each iteration. Labeling the features is especially helpful for high dimensional data, since it allows the model to discover the important features earlier. The method that we propose is by synthesizing new instances that represent a set of features. By utilizing synthesized instances, we can treat this set of features as if they are regular instances. Therefore they could be compared on an equal ground when the model tries to select which instances to be labeled by the oracle. The features used to build the synthesized instances need to be carefully selected so the resulting synthesized instances could improve the model and not introducing any contradicting information. We utilize hierarchical clustering in order to group features that have similar context. This is done first by picking clusters whose purity are estimated to be high. Then we score the features based on how common the feature is in the cluster and how related the feature is to the estimated majority label. The top scoring features then will be used to synthesize instances. Since we are picking clusters that are estimated to has high purity, there is a good chance that the top scoring features will not contradicting each other. Hsing-Kuo Pao 鮑興國 2017 學位論文 ; thesis 51 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺灣科技大學 === 資訊工程系 === 105 === Active Learning is a machine learning framework that tries to solves the issue of having huge amount of unlabeled data compared to the labeled data. Most studies in Active Learning focus on how to select the unlabeled data to be labeled by a human oracle in order to maximize the performance gain of the model with as little labeling effort as possible. In this thesis, however, we focus not only in how to select the data instances but also how to select which features to be labeled by the oracle in a unified manner. By unified, it means that we tried to select the best possible combination of features and instances on each iteration. Labeling the features is especially helpful for high dimensional data, since it allows the model to discover the important features earlier.
The method that we propose is by synthesizing new instances that represent a set of features. By utilizing synthesized instances, we can treat this set of features as if they are regular instances. Therefore they could be compared on an equal ground when the model tries to select which instances to be labeled by the oracle. The features used to build the synthesized instances need to be carefully selected so the resulting synthesized instances could improve the model and not introducing any contradicting information. We utilize hierarchical clustering in order to group features that have similar context. This is done first by picking clusters whose purity are estimated to be high. Then we score the features based on how common the feature is in the cluster and how related the feature is to the estimated majority label. The top scoring features then will be used to synthesize instances. Since we are picking clusters that are estimated to has high purity, there is a good chance that the top scoring features will not contradicting each other.
|
author2 |
Hsing-Kuo Pao |
author_facet |
Hsing-Kuo Pao Adrian Chriswanto Adrian Chriswanto |
author |
Adrian Chriswanto Adrian Chriswanto |
spellingShingle |
Adrian Chriswanto Adrian Chriswanto A Unified Approach on Active Learning Dual Supervision |
author_sort |
Adrian Chriswanto |
title |
A Unified Approach on Active Learning Dual Supervision |
title_short |
A Unified Approach on Active Learning Dual Supervision |
title_full |
A Unified Approach on Active Learning Dual Supervision |
title_fullStr |
A Unified Approach on Active Learning Dual Supervision |
title_full_unstemmed |
A Unified Approach on Active Learning Dual Supervision |
title_sort |
unified approach on active learning dual supervision |
publishDate |
2017 |
url |
http://ndltd.ncl.edu.tw/handle/50414305702296531005 |
work_keys_str_mv |
AT adrianchriswanto aunifiedapproachonactivelearningdualsupervision AT adrianchriswanto aunifiedapproachonactivelearningdualsupervision AT adrianchriswanto unifiedapproachonactivelearningdualsupervision AT adrianchriswanto unifiedapproachonactivelearningdualsupervision |
_version_ |
1718559072723140608 |