Scaling Up Support Vector Machines with Application to Plankton Recognition
Learning a predictive model for a large scale real-world problem presents several challenges: the choice of a good feature set and a scalable machine learning algorithm with small generalization error. A support vector machine (SVM), based on statistical learning theory, obtains good generalization...
Main Author: | |
---|---|
Format: | Others |
Published: |
Scholar Commons
2005
|
Subjects: | |
Online Access: | https://scholarcommons.usf.edu/etd/753 https://scholarcommons.usf.edu/cgi/viewcontent.cgi?article=1752&context=etd |
id |
ndltd-USF-oai-scholarcommons.usf.edu-etd-1752 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-USF-oai-scholarcommons.usf.edu-etd-17522019-10-04T05:21:25Z Scaling Up Support Vector Machines with Application to Plankton Recognition Luo, Tong Learning a predictive model for a large scale real-world problem presents several challenges: the choice of a good feature set and a scalable machine learning algorithm with small generalization error. A support vector machine (SVM), based on statistical learning theory, obtains good generalization by restricting the capacity of its hypothesis space. A SVM outperforms classical learning algorithms on many benchmark data sets. Its excellent performance makes it the ideal choice for pattern recognition problems. However, training a SVM involves constrained quadratic programming, which leads to poor scalability. In this dissertation, we propose several methods to improve a SVM's scalability. The evaluation is done mainly in the context of a plankton recognition problem. One approach is called active learning, which selectively asks a domain expert to label a subset of examples from a lot of unlabeled data. Active learning minimizes the number of labeled examples needed to build an accurate model and reduces the human effort in manually labeling the data. We propose a new active learning method "Breaking Ties" (BT) for multi-class SVMs. After developing a probability model for multiple class SVMs, "BT" selectively labels examples for which the difference in probabilities between the predicted most likely class and second most likely class is smallest. This simple strategy required several times less labeled plankton images to reach a given recognition accuracy when compared to random sampling in our plankton recognition system. To speed up a SVM's training and prediction, we show how to apply bit reduction to compress the examples into several bins. Weights are assigned to different bins based on the number of examples in the bin. Treating each bin as a weighted example, a SVM builds a model using the reduced-set of weighted examples. 2005-02-10T08:00:00Z text application/pdf https://scholarcommons.usf.edu/etd/753 https://scholarcommons.usf.edu/cgi/viewcontent.cgi?article=1752&context=etd default Graduate Theses and Dissertations Scholar Commons Machine learning Data mining Kernel machines Active learning Bit reduction American Studies Arts and Humanities |
collection |
NDLTD |
format |
Others
|
sources |
NDLTD |
topic |
Machine learning Data mining Kernel machines Active learning Bit reduction American Studies Arts and Humanities |
spellingShingle |
Machine learning Data mining Kernel machines Active learning Bit reduction American Studies Arts and Humanities Luo, Tong Scaling Up Support Vector Machines with Application to Plankton Recognition |
description |
Learning a predictive model for a large scale real-world problem presents several challenges: the choice of a good feature set and a scalable machine learning algorithm with small generalization error. A support vector machine (SVM), based on statistical learning theory, obtains good generalization by restricting the capacity of its hypothesis space. A SVM outperforms classical learning algorithms on many benchmark data sets. Its excellent performance makes it the ideal choice for pattern recognition problems. However, training a SVM involves constrained quadratic programming, which leads to poor scalability. In this dissertation, we propose several methods to improve a SVM's scalability. The evaluation is done mainly in the context of a plankton recognition problem.
One approach is called active learning, which selectively asks a domain expert to label a subset of examples from a lot of unlabeled data. Active learning minimizes the number of labeled examples needed to build an accurate model and reduces the human effort in manually labeling the data. We propose a new active learning method "Breaking Ties" (BT) for multi-class SVMs. After developing a probability model for multiple class SVMs, "BT" selectively labels examples for which the difference in probabilities between the predicted most likely class and second most likely class is smallest. This simple strategy required several times less labeled plankton images to reach a given recognition accuracy when compared to random sampling in our plankton recognition system.
To speed up a SVM's training and prediction, we show how to apply bit reduction to compress the examples into several bins. Weights are assigned to different bins based on the number of examples in the bin. Treating each bin as a weighted example, a SVM builds a model using the reduced-set of weighted examples. |
author |
Luo, Tong |
author_facet |
Luo, Tong |
author_sort |
Luo, Tong |
title |
Scaling Up Support Vector Machines with Application to Plankton Recognition |
title_short |
Scaling Up Support Vector Machines with Application to Plankton Recognition |
title_full |
Scaling Up Support Vector Machines with Application to Plankton Recognition |
title_fullStr |
Scaling Up Support Vector Machines with Application to Plankton Recognition |
title_full_unstemmed |
Scaling Up Support Vector Machines with Application to Plankton Recognition |
title_sort |
scaling up support vector machines with application to plankton recognition |
publisher |
Scholar Commons |
publishDate |
2005 |
url |
https://scholarcommons.usf.edu/etd/753 https://scholarcommons.usf.edu/cgi/viewcontent.cgi?article=1752&context=etd |
work_keys_str_mv |
AT luotong scalingupsupportvectormachineswithapplicationtoplanktonrecognition |
_version_ |
1719260679650672640 |