Scaling Up Support Vector Machines with Application to Plankton Recognition

Learning a predictive model for a large scale real-world problem presents several challenges: the choice of a good feature set and a scalable machine learning algorithm with small generalization error. A support vector machine (SVM), based on statistical learning theory, obtains good generalization...

Full description

Bibliographic Details
Main Author: Luo, Tong
Format: Others
Published: Scholar Commons 2005
Subjects:
Online Access:https://scholarcommons.usf.edu/etd/753
https://scholarcommons.usf.edu/cgi/viewcontent.cgi?article=1752&context=etd
id ndltd-USF-oai-scholarcommons.usf.edu-etd-1752
record_format oai_dc
spelling ndltd-USF-oai-scholarcommons.usf.edu-etd-17522019-10-04T05:21:25Z Scaling Up Support Vector Machines with Application to Plankton Recognition Luo, Tong Learning a predictive model for a large scale real-world problem presents several challenges: the choice of a good feature set and a scalable machine learning algorithm with small generalization error. A support vector machine (SVM), based on statistical learning theory, obtains good generalization by restricting the capacity of its hypothesis space. A SVM outperforms classical learning algorithms on many benchmark data sets. Its excellent performance makes it the ideal choice for pattern recognition problems. However, training a SVM involves constrained quadratic programming, which leads to poor scalability. In this dissertation, we propose several methods to improve a SVM's scalability. The evaluation is done mainly in the context of a plankton recognition problem. One approach is called active learning, which selectively asks a domain expert to label a subset of examples from a lot of unlabeled data. Active learning minimizes the number of labeled examples needed to build an accurate model and reduces the human effort in manually labeling the data. We propose a new active learning method "Breaking Ties" (BT) for multi-class SVMs. After developing a probability model for multiple class SVMs, "BT" selectively labels examples for which the difference in probabilities between the predicted most likely class and second most likely class is smallest. This simple strategy required several times less labeled plankton images to reach a given recognition accuracy when compared to random sampling in our plankton recognition system. To speed up a SVM's training and prediction, we show how to apply bit reduction to compress the examples into several bins. Weights are assigned to different bins based on the number of examples in the bin. Treating each bin as a weighted example, a SVM builds a model using the reduced-set of weighted examples. 2005-02-10T08:00:00Z text application/pdf https://scholarcommons.usf.edu/etd/753 https://scholarcommons.usf.edu/cgi/viewcontent.cgi?article=1752&context=etd default Graduate Theses and Dissertations Scholar Commons Machine learning Data mining Kernel machines Active learning Bit reduction American Studies Arts and Humanities
collection NDLTD
format Others
sources NDLTD
topic Machine learning
Data mining
Kernel machines
Active learning
Bit reduction
American Studies
Arts and Humanities
spellingShingle Machine learning
Data mining
Kernel machines
Active learning
Bit reduction
American Studies
Arts and Humanities
Luo, Tong
Scaling Up Support Vector Machines with Application to Plankton Recognition
description Learning a predictive model for a large scale real-world problem presents several challenges: the choice of a good feature set and a scalable machine learning algorithm with small generalization error. A support vector machine (SVM), based on statistical learning theory, obtains good generalization by restricting the capacity of its hypothesis space. A SVM outperforms classical learning algorithms on many benchmark data sets. Its excellent performance makes it the ideal choice for pattern recognition problems. However, training a SVM involves constrained quadratic programming, which leads to poor scalability. In this dissertation, we propose several methods to improve a SVM's scalability. The evaluation is done mainly in the context of a plankton recognition problem. One approach is called active learning, which selectively asks a domain expert to label a subset of examples from a lot of unlabeled data. Active learning minimizes the number of labeled examples needed to build an accurate model and reduces the human effort in manually labeling the data. We propose a new active learning method "Breaking Ties" (BT) for multi-class SVMs. After developing a probability model for multiple class SVMs, "BT" selectively labels examples for which the difference in probabilities between the predicted most likely class and second most likely class is smallest. This simple strategy required several times less labeled plankton images to reach a given recognition accuracy when compared to random sampling in our plankton recognition system. To speed up a SVM's training and prediction, we show how to apply bit reduction to compress the examples into several bins. Weights are assigned to different bins based on the number of examples in the bin. Treating each bin as a weighted example, a SVM builds a model using the reduced-set of weighted examples.
author Luo, Tong
author_facet Luo, Tong
author_sort Luo, Tong
title Scaling Up Support Vector Machines with Application to Plankton Recognition
title_short Scaling Up Support Vector Machines with Application to Plankton Recognition
title_full Scaling Up Support Vector Machines with Application to Plankton Recognition
title_fullStr Scaling Up Support Vector Machines with Application to Plankton Recognition
title_full_unstemmed Scaling Up Support Vector Machines with Application to Plankton Recognition
title_sort scaling up support vector machines with application to plankton recognition
publisher Scholar Commons
publishDate 2005
url https://scholarcommons.usf.edu/etd/753
https://scholarcommons.usf.edu/cgi/viewcontent.cgi?article=1752&context=etd
work_keys_str_mv AT luotong scalingupsupportvectormachineswithapplicationtoplanktonrecognition
_version_ 1719260679650672640