An Evaluation of TensorFlow as a Programming Framework for HPC Applications

In recent years, deep-learning, a branch of machine learning gained increasing popularity due to their extensive applications and performance. At the core of these application is dense matrix-matrix multiplication. Graphics Processing Units (GPUs) are commonly used in the training process due to the...

Full description

Bibliographic Details
Main Author:	Chien, Wei Der
Format:	Others
Language:	English
Published:	KTH, Beräkningsvetenskap och beräkningsteknik (CST) 2018
Subjects:	HPC GPU TensorFlow Computer Systems Datorsystem
Online Access:	http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-233795

id	ndltd-UPSALLA1-oai-DiVA.org-kth-233795
record_format	oai_dc
collection	NDLTD
language	English
format	Others
sources	NDLTD
topic	HPC GPU TensorFlow Computer Systems Datorsystem
spellingShingle	HPC GPU TensorFlow Computer Systems Datorsystem Chien, Wei Der An Evaluation of TensorFlow as a Programming Framework for HPC Applications
description	In recent years, deep-learning, a branch of machine learning gained increasing popularity due to their extensive applications and performance. At the core of these application is dense matrix-matrix multiplication. Graphics Processing Units (GPUs) are commonly used in the training process due to their massively parallel computation capabilities. In addition, specialized low-precision accelerators have emerged to specifically address Tensor operations. Software frameworks, such as TensorFlow have also emerged to increase the expressiveness of neural network model development. In TensorFlow computation problems are expressed as Computation Graphs where nodes of a graph denote operation and edges denote data movement between operations. With increasing number of heterogeneous accelerators which might co-exist on the same cluster system, it became increasingly difficult for users to program efficient and scalable applications. TensorFlow provides a high level of abstraction and it is possible to place operations of a computation graph on a device easily through a high level API. In this work, the usability of TensorFlow as a programming framework for HPC application is reviewed. We give an introduction of TensorFlow as a programming framework and paradigm for distributed computation. Two sample applications are implemented on TensorFlow: tiled matrix multiplication and conjugate gradient solver for solving large linear systems. We try to illustrate how such problems can be expressed in computation graph for distributed computation. We perform scalability tests and comment on performance scaling results and quantify how TensorFlow can take advantage of HPC systems by performing micro-benchmarking on communication performance. Through this work, we show that TensorFlow is an emerging and promising platform which is well suited for a particular class of problem which requires very little synchronization. === Under de senaste åren har deep-learning, en så kallad typ av maskininlärning, blivit populärt på grund av dess applikationer och prestanda. Den viktigaste komponenten i de här teknikerna är matrismultiplikation. Grafikprocessorer (GPUs) är vanligt förekommande vid träningsprocesser av artificiella neuronnät. Detta på grund av deras massivt parallella beräkningskapacitet. Dessutom har specialiserade lågprecisionsacceleratorer som specifikt beräknar matrismultiplikation tagits fram. Många utvecklingsramverk har framkommit för att hjälpa programmerare att hantera artificiella neuronnät. I TensorFlow uttrycks beräkningsproblem som en beräkningsgraf. En nod representerar en beräkningsoperation och en väg representerar dataflöde mellan beräkningsoperationer i en beräkningsgraf. Eftersom man måste programmera olika acceleratorer med olika systemarkitekturer har programmering av högprestandasystem blivit allt svårare. TensorFlow erbjuder en hög abstraktionsnivå och förenklar programmering av högprestandaberäkningar. Man programmerar acceleratorer genom att placera operationer inom grafen på olika acceleratorer med en API. I detta arbete granskas användbarheten hos TensorFlow som ett programmeringsramverk för applikationer med högprestandaberäkningar. Vi presenterar TensorFlow som ett programmeringsutvecklingsramverk för distribuerad beräkning. Vi implementerar två vanliga applikationer i TensorFlow: en lösare som löser linjära ekvationsystem med konjugerade gradientmetoden samt blockmatrismultiplikation och illustrerar hur de här problemen kan uttryckas i beräkningsgrafer för distribuerad beräkning. Vi experimenterar och kommenterar metoder för att demonstrera hur TensorFlow kan nyttja HPC-maskinvaror. Vi testar både skalbarhet och effektivitet samt gör mikro-benchmarking på kommunikationsprestanda. Genom detta arbete visar vi att TensorFlow är en framväxande och lovande plattform som passar väl för en viss typ av problem som kräver minimal synkronisering.
author	Chien, Wei Der
author_facet	Chien, Wei Der
author_sort	Chien, Wei Der
title	An Evaluation of TensorFlow as a Programming Framework for HPC Applications
title_short	An Evaluation of TensorFlow as a Programming Framework for HPC Applications
title_full	An Evaluation of TensorFlow as a Programming Framework for HPC Applications
title_fullStr	An Evaluation of TensorFlow as a Programming Framework for HPC Applications
title_full_unstemmed	An Evaluation of TensorFlow as a Programming Framework for HPC Applications
title_sort	evaluation of tensorflow as a programming framework for hpc applications
publisher	KTH, Beräkningsvetenskap och beräkningsteknik (CST)
publishDate	2018
url	http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-233795
work_keys_str_mv	AT chienweider anevaluationoftensorflowasaprogrammingframeworkforhpcapplications AT chienweider enundersokningavtensorflowsomettutvecklingsramverkforhogpresteradedatorsystem AT chienweider evaluationoftensorflowasaprogrammingframeworkforhpcapplications
_version_	1718734689156464640
spelling	ndltd-UPSALLA1-oai-DiVA.org-kth-2337952018-09-20T05:42:47ZAn Evaluation of TensorFlow as a Programming Framework for HPC ApplicationsengEn undersökning av TensorFlow som ett utvecklingsramverk för högpresterade datorsystemChien, Wei DerKTH, Beräkningsvetenskap och beräkningsteknik (CST)KTH, Parallelldatorcentrum, PDC2018HPCGPUTensorFlowComputer SystemsDatorsystemIn recent years, deep-learning, a branch of machine learning gained increasing popularity due to their extensive applications and performance. At the core of these application is dense matrix-matrix multiplication. Graphics Processing Units (GPUs) are commonly used in the training process due to their massively parallel computation capabilities. In addition, specialized low-precision accelerators have emerged to specifically address Tensor operations. Software frameworks, such as TensorFlow have also emerged to increase the expressiveness of neural network model development. In TensorFlow computation problems are expressed as Computation Graphs where nodes of a graph denote operation and edges denote data movement between operations. With increasing number of heterogeneous accelerators which might co-exist on the same cluster system, it became increasingly difficult for users to program efficient and scalable applications. TensorFlow provides a high level of abstraction and it is possible to place operations of a computation graph on a device easily through a high level API. In this work, the usability of TensorFlow as a programming framework for HPC application is reviewed. We give an introduction of TensorFlow as a programming framework and paradigm for distributed computation. Two sample applications are implemented on TensorFlow: tiled matrix multiplication and conjugate gradient solver for solving large linear systems. We try to illustrate how such problems can be expressed in computation graph for distributed computation. We perform scalability tests and comment on performance scaling results and quantify how TensorFlow can take advantage of HPC systems by performing micro-benchmarking on communication performance. Through this work, we show that TensorFlow is an emerging and promising platform which is well suited for a particular class of problem which requires very little synchronization. Under de senaste åren har deep-learning, en så kallad typ av maskininlärning, blivit populärt på grund av dess applikationer och prestanda. Den viktigaste komponenten i de här teknikerna är matrismultiplikation. Grafikprocessorer (GPUs) är vanligt förekommande vid träningsprocesser av artificiella neuronnät. Detta på grund av deras massivt parallella beräkningskapacitet. Dessutom har specialiserade lågprecisionsacceleratorer som specifikt beräknar matrismultiplikation tagits fram. Många utvecklingsramverk har framkommit för att hjälpa programmerare att hantera artificiella neuronnät. I TensorFlow uttrycks beräkningsproblem som en beräkningsgraf. En nod representerar en beräkningsoperation och en väg representerar dataflöde mellan beräkningsoperationer i en beräkningsgraf. Eftersom man måste programmera olika acceleratorer med olika systemarkitekturer har programmering av högprestandasystem blivit allt svårare. TensorFlow erbjuder en hög abstraktionsnivå och förenklar programmering av högprestandaberäkningar. Man programmerar acceleratorer genom att placera operationer inom grafen på olika acceleratorer med en API. I detta arbete granskas användbarheten hos TensorFlow som ett programmeringsramverk för applikationer med högprestandaberäkningar. Vi presenterar TensorFlow som ett programmeringsutvecklingsramverk för distribuerad beräkning. Vi implementerar två vanliga applikationer i TensorFlow: en lösare som löser linjära ekvationsystem med konjugerade gradientmetoden samt blockmatrismultiplikation och illustrerar hur de här problemen kan uttryckas i beräkningsgrafer för distribuerad beräkning. Vi experimenterar och kommenterar metoder för att demonstrera hur TensorFlow kan nyttja HPC-maskinvaror. Vi testar både skalbarhet och effektivitet samt gör mikro-benchmarking på kommunikationsprestanda. Genom detta arbete visar vi att TensorFlow är en framväxande och lovande plattform som passar väl för en viss typ av problem som kräver minimal synkronisering. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-233795TRITA-EECS-EX ; 2018:540application/pdfinfo:eu-repo/semantics/openAccess

An Evaluation of TensorFlow as a Programming Framework for HPC Applications

Similar Items