Minimizing Dataset Size Requirements for Machine Learning

abstract: Machine learning methodologies are widely used in almost all aspects of software engineering. An effective machine learning model requires large amounts of data to achieve high accuracy. The data used for classification is mostly labeled, which is difficult to obtain. The dataset requires...

Full description

Bibliographic Details
Other Authors: Batra, Salil (Author)
Format: Dissertation
Language:English
Published: 2017
Subjects:
Online Access:http://hdl.handle.net/2286/R.I.44214
id ndltd-asu.edu-item-44214
record_format oai_dc
spelling ndltd-asu.edu-item-442142018-06-22T03:08:30Z Minimizing Dataset Size Requirements for Machine Learning abstract: Machine learning methodologies are widely used in almost all aspects of software engineering. An effective machine learning model requires large amounts of data to achieve high accuracy. The data used for classification is mostly labeled, which is difficult to obtain. The dataset requires both high costs and effort to accurately label the data into different classes. With abundance of data, it becomes necessary that all the data should be labeled for its proper utilization and this work focuses on reducing the labeling effort for large dataset. The thesis presents a comparison of different classifiers performance to test if small set of labeled data can be utilized to build accurate models for high prediction rate. The use of small dataset for classification is then extended to active machine learning methodology where, first a one class classifier will predict the outliers in the data and then the outlier samples are added to a training set for support vector machine classifier for labeling the unlabeled data. The labeling of dataset can be scaled up to avoid manual labeling and building more robust machine learning methodologies. Dissertation/Thesis Batra, Salil (Author) Femiani, John (Advisor) Amresh, Ashish (Advisor) Bansal, Ajay (Committee member) Arizona State University (Publisher) Computer science Active Learning Machine Learning One Class Classification eng 60 pages Masters Thesis Engineering 2017 Masters Thesis http://hdl.handle.net/2286/R.I.44214 http://rightsstatements.org/vocab/InC/1.0/ All Rights Reserved 2017
collection NDLTD
language English
format Dissertation
sources NDLTD
topic Computer science
Active Learning
Machine Learning
One Class Classification
spellingShingle Computer science
Active Learning
Machine Learning
One Class Classification
Minimizing Dataset Size Requirements for Machine Learning
description abstract: Machine learning methodologies are widely used in almost all aspects of software engineering. An effective machine learning model requires large amounts of data to achieve high accuracy. The data used for classification is mostly labeled, which is difficult to obtain. The dataset requires both high costs and effort to accurately label the data into different classes. With abundance of data, it becomes necessary that all the data should be labeled for its proper utilization and this work focuses on reducing the labeling effort for large dataset. The thesis presents a comparison of different classifiers performance to test if small set of labeled data can be utilized to build accurate models for high prediction rate. The use of small dataset for classification is then extended to active machine learning methodology where, first a one class classifier will predict the outliers in the data and then the outlier samples are added to a training set for support vector machine classifier for labeling the unlabeled data. The labeling of dataset can be scaled up to avoid manual labeling and building more robust machine learning methodologies. === Dissertation/Thesis === Masters Thesis Engineering 2017
author2 Batra, Salil (Author)
author_facet Batra, Salil (Author)
title Minimizing Dataset Size Requirements for Machine Learning
title_short Minimizing Dataset Size Requirements for Machine Learning
title_full Minimizing Dataset Size Requirements for Machine Learning
title_fullStr Minimizing Dataset Size Requirements for Machine Learning
title_full_unstemmed Minimizing Dataset Size Requirements for Machine Learning
title_sort minimizing dataset size requirements for machine learning
publishDate 2017
url http://hdl.handle.net/2286/R.I.44214
_version_ 1718701472659537920