Active learning via Transduction in Regression Forests

Context. The amount of training data required to build accurate modelsis a common problem in machine learning. Active learning is a techniquethat tries to reduce the amount of required training data by making activechoices of which training data holds the greatest value.Objectives. This thesis aims...

Full description

Bibliographic Details
Main Authors:	Hansson, Kim, Hörlin, Erik
Format:	Others
Language:	English
Published:	Blekinge Tekniska Högskola, Institutionen för kreativa teknologier 2015
Subjects:	Active learning Regression Random Forests Semi-supervised learning Transduction
Online Access:	http://urn.kb.se/resolve?urn=urn:nbn:se:bth-10935

id	ndltd-UPSALLA1-oai-DiVA.org-bth-10935
record_format	oai_dc
spelling	ndltd-UPSALLA1-oai-DiVA.org-bth-109352016-02-23T05:05:58ZActive learning via Transduction in Regression ForestsengHansson, KimHörlin, ErikBlekinge Tekniska Högskola, Institutionen för kreativa teknologierBlekinge Tekniska Högskola, Institutionen för kreativa teknologier2015Active learningRegressionRandom ForestsSemi-supervised learningTransductionContext. The amount of training data required to build accurate modelsis a common problem in machine learning. Active learning is a techniquethat tries to reduce the amount of required training data by making activechoices of which training data holds the greatest value.Objectives. This thesis aims to design, implement and evaluate the Ran-dom Forests algorithm combined with active learning that is suitable forpredictive tasks with real-value data outcomes where the amount of train-ing data is small. machine learning algorithms traditionally requires largeamounts of training data to create a general model, and training data is inmany cases sparse and expensive or difficult to create.Methods.The research methods used for this thesis is implementation andscientific experiment. An approach to active learning was implementedbased on previous work for classification type problems. The approachuses the Mahalanobis distance to perform active learning via transduction.Evaluation was done using several data sets were the decrease in predictionerror was measured over several iterations. The results of the evaluationwas then analyzed using nonparametric statistical testing.Results. The statistical analysis of the evaluation results failed to detect adifference between our approach and a non active learning approach, eventhough the proposed algorithm showed irregular performance. The evalu-ation of our tree-based traversal method, and the evaluation of the Maha-lanobis distance for transduction both showed that these methods performedbetter than Euclidean distance and complete graph traversal.Conclusions. We conclude that the proposed solution did not decreasethe amount of required training data on a significant level. However, theapproach has potential and future work could lead to a working active learn-ing solution. Further work is needed on key areas of the implementation,such as the choice of instances for active learning through transduction un-certainty as well as choice of method for going from transduction model toinduction model. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:bth-10935application/pdfinfo:eu-repo/semantics/openAccess
collection	NDLTD
language	English
format	Others
sources	NDLTD
topic	Active learning Regression Random Forests Semi-supervised learning Transduction
spellingShingle	Active learning Regression Random Forests Semi-supervised learning Transduction Hansson, Kim Hörlin, Erik Active learning via Transduction in Regression Forests
description	Context. The amount of training data required to build accurate modelsis a common problem in machine learning. Active learning is a techniquethat tries to reduce the amount of required training data by making activechoices of which training data holds the greatest value.Objectives. This thesis aims to design, implement and evaluate the Ran-dom Forests algorithm combined with active learning that is suitable forpredictive tasks with real-value data outcomes where the amount of train-ing data is small. machine learning algorithms traditionally requires largeamounts of training data to create a general model, and training data is inmany cases sparse and expensive or difficult to create.Methods.The research methods used for this thesis is implementation andscientific experiment. An approach to active learning was implementedbased on previous work for classification type problems. The approachuses the Mahalanobis distance to perform active learning via transduction.Evaluation was done using several data sets were the decrease in predictionerror was measured over several iterations. The results of the evaluationwas then analyzed using nonparametric statistical testing.Results. The statistical analysis of the evaluation results failed to detect adifference between our approach and a non active learning approach, eventhough the proposed algorithm showed irregular performance. The evalu-ation of our tree-based traversal method, and the evaluation of the Maha-lanobis distance for transduction both showed that these methods performedbetter than Euclidean distance and complete graph traversal.Conclusions. We conclude that the proposed solution did not decreasethe amount of required training data on a significant level. However, theapproach has potential and future work could lead to a working active learn-ing solution. Further work is needed on key areas of the implementation,such as the choice of instances for active learning through transduction un-certainty as well as choice of method for going from transduction model toinduction model.
author	Hansson, Kim Hörlin, Erik
author_facet	Hansson, Kim Hörlin, Erik
author_sort	Hansson, Kim
title	Active learning via Transduction in Regression Forests
title_short	Active learning via Transduction in Regression Forests
title_full	Active learning via Transduction in Regression Forests
title_fullStr	Active learning via Transduction in Regression Forests
title_full_unstemmed	Active learning via Transduction in Regression Forests
title_sort	active learning via transduction in regression forests
publisher	Blekinge Tekniska Högskola, Institutionen för kreativa teknologier
publishDate	2015
url	http://urn.kb.se/resolve?urn=urn:nbn:se:bth-10935
work_keys_str_mv	AT hanssonkim activelearningviatransductioninregressionforests AT horlinerik activelearningviatransductioninregressionforests
_version_	1718196063625543680

Active learning via Transduction in Regression Forests

Similar Items