High-Performance Computing For Support Vector Machines

Machine learning algorithms are very successful in solving classification and regression problems, however the immense amount of data created by digitalization slows down the training and predicting processes, if solvable at all. High-Performance Computing(HPC) and particularly parallel computing ar...

Full description

Bibliographic Details
Main Author:	Tavara, Shirin
Format:	Others
Language:	English
Published:	Högskolan i Skövde, Institutionen för informationsteknologi 2018
Subjects:	Computer Sciences Datavetenskap (datalogi)
Online Access:	http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-16556 http://nbn-resolving.de/urn:isbn:978-91-984187-8-1

id	ndltd-UPSALLA1-oai-DiVA.org-his-16556
record_format	oai_dc
spelling	ndltd-UPSALLA1-oai-DiVA.org-his-165562019-02-15T05:59:04ZHigh-Performance Computing For Support Vector MachinesengTavara, ShirinHögskolan i Skövde, Institutionen för informationsteknologiHögskolan i Skövde, Forskningscentrum för InformationsteknologiSkövde : University of Skövde2018Computer SciencesDatavetenskap (datalogi)Machine learning algorithms are very successful in solving classification and regression problems, however the immense amount of data created by digitalization slows down the training and predicting processes, if solvable at all. High-Performance Computing(HPC) and particularly parallel computing are promising tools for improving the performance of machine learning algorithms in terms of time. Support Vector Machines(SVM) is one of the most popular supervised machine learning techniques that enjoy the advancement of HPC to overcome the problems regarding big data, however, efficient parallel implementations of SVM is a complex endeavour. While there are many parallel techniques to facilitate the performance of SVM, there is no clear roadmap for every application scenario. This thesis is based on a collection of publications. It addresses the problems regarding parallel implementations of SVM through four research questions, all of which are answered through three research articles. In the first research question, the thesis investigates important factors such as parallel algorithms, HPC tools, and heuristics on the efficiency of parallel SVM implementation. This leads to identifying the state of the art parallel implementations of SVMs, their pros and cons, and suggests possible avenues for future research. It is up to the user to create a balance between the computation time and the classification accuracy. In the second research question, the thesis explores the impact of changes in problem size, and the value of corresponding SVM parameters that lead to significant performance. This leads to addressing the impact of the problem size on the optimal choice of important parameters. Besides, the thesis shows the existence of a threshold between the number of cores and the training time. In the third research question, the thesis investigates the impact of the network topology on the performance of a network-based SVM. This leads to three key contributions. The first contribution is to show how much the expansion property of the network impact the convergence. The next is to show which network topology is preferable to efficiently use the computing powers. Third is to supply an implementation making the theoretical advances practically available. The results show that graphs with large spectral gaps and higher degrees exhibit accelerated convergence. In the last research question, the thesis combines all contributions in the articles and offers recommendations towards implementing an efficient framework for SVMs regarding large-scale problems. Licentiate thesis, comprehensive summaryinfo:eu-repo/semantics/masterThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-16556urn:isbn:978-91-984187-8-1Dissertation Series ; 26 (2018)application/pdfinfo:eu-repo/semantics/openAccess
collection	NDLTD
language	English
format	Others
sources	NDLTD
topic	Computer Sciences Datavetenskap (datalogi)
spellingShingle	Computer Sciences Datavetenskap (datalogi) Tavara, Shirin High-Performance Computing For Support Vector Machines
description	Machine learning algorithms are very successful in solving classification and regression problems, however the immense amount of data created by digitalization slows down the training and predicting processes, if solvable at all. High-Performance Computing(HPC) and particularly parallel computing are promising tools for improving the performance of machine learning algorithms in terms of time. Support Vector Machines(SVM) is one of the most popular supervised machine learning techniques that enjoy the advancement of HPC to overcome the problems regarding big data, however, efficient parallel implementations of SVM is a complex endeavour. While there are many parallel techniques to facilitate the performance of SVM, there is no clear roadmap for every application scenario. This thesis is based on a collection of publications. It addresses the problems regarding parallel implementations of SVM through four research questions, all of which are answered through three research articles. In the first research question, the thesis investigates important factors such as parallel algorithms, HPC tools, and heuristics on the efficiency of parallel SVM implementation. This leads to identifying the state of the art parallel implementations of SVMs, their pros and cons, and suggests possible avenues for future research. It is up to the user to create a balance between the computation time and the classification accuracy. In the second research question, the thesis explores the impact of changes in problem size, and the value of corresponding SVM parameters that lead to significant performance. This leads to addressing the impact of the problem size on the optimal choice of important parameters. Besides, the thesis shows the existence of a threshold between the number of cores and the training time. In the third research question, the thesis investigates the impact of the network topology on the performance of a network-based SVM. This leads to three key contributions. The first contribution is to show how much the expansion property of the network impact the convergence. The next is to show which network topology is preferable to efficiently use the computing powers. Third is to supply an implementation making the theoretical advances practically available. The results show that graphs with large spectral gaps and higher degrees exhibit accelerated convergence. In the last research question, the thesis combines all contributions in the articles and offers recommendations towards implementing an efficient framework for SVMs regarding large-scale problems.
author	Tavara, Shirin
author_facet	Tavara, Shirin
author_sort	Tavara, Shirin
title	High-Performance Computing For Support Vector Machines
title_short	High-Performance Computing For Support Vector Machines
title_full	High-Performance Computing For Support Vector Machines
title_fullStr	High-Performance Computing For Support Vector Machines
title_full_unstemmed	High-Performance Computing For Support Vector Machines
title_sort	high-performance computing for support vector machines
publisher	Högskolan i Skövde, Institutionen för informationsteknologi
publishDate	2018
url	http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-16556 http://nbn-resolving.de/urn:isbn:978-91-984187-8-1
work_keys_str_mv	AT tavarashirin highperformancecomputingforsupportvectormachines
_version_	1718976270052622336

High-Performance Computing For Support Vector Machines

Similar Items