A portable relational algebra library for high performance data-intensive query processing

A growing number of industries are turning to data warehousing applications such as forecasting and risk assessment to process large volumes of data. These data warehousing applications, which utilize queries comprised of a mix of arithmetic and relational algebra (RA) operators, currently run on sy...

Full description

Bibliographic Details
Main Author: Saeed, Ifrah
Other Authors: Yalamanchili, Sudhakar
Language:en_US
Published: Georgia Institute of Technology 2014
Subjects:
Online Access:http://hdl.handle.net/1853/51967
id ndltd-GATECH-oai-smartech.gatech.edu-1853-51967
record_format oai_dc
spelling ndltd-GATECH-oai-smartech.gatech.edu-1853-519672014-09-30T03:35:24ZA portable relational algebra library for high performance data-intensive query processingSaeed, IfrahData-intensive query processingRA operatorsOpenCLGPUsCPUsGraphics processing unitsData warehousingBig dataRelation algebrasA growing number of industries are turning to data warehousing applications such as forecasting and risk assessment to process large volumes of data. These data warehousing applications, which utilize queries comprised of a mix of arithmetic and relational algebra (RA) operators, currently run on systems that utilize commodity multi-core CPUs. If we acknowledge the data-intensive nature of these applications, general purpose graphics processing units (GPUs) with high throughput and memory bandwidth seem to be natural candidates to host these applications. However, since such relational queries exhibit irregular parallelism and data accesses, their efficient implementation on GPUs remains challenging. Thus, although tailored solutions for individual processors using their native programming environments have evolved, these solutions are not accessible to other processors. This thesis addresses this problem by providing a portable implementation of RA, mathematical, and related primitives required to implement and accelerate relational queries over large data sets in the form of the library. These primitives can run on any modern multi- and many-core architecture that supports OpenCL, thereby enhancing the performance potential of such architectures for warehousing applications. In essence, this thesis describes the implementation of primitives and the results of their performance evaluation on a range of platforms and concludes with insights, the identification of opportunities, and lessons learned. One of the major insights from our analysis is that for complex relational queries, the time taken to transfer data between host CPUs and discrete GPUs can render the performance of discrete and integrated GPUs comparable in spite of the higher computing power and memory bandwidth of discrete GPUs. Therefore, data movement optimization is the key to eff ectively harnessing the high performance of discrete GPUs; otherwise, cost eff ectiveness would encourage the use of integrated GPUs. Furthermore, portability also enables the complete utilization of all GPUs and CPUs in the system at run time by opportunistically using any type of available processor when a kernel is ready for execution.Georgia Institute of TechnologyYalamanchili, Sudhakar2014-06-09T18:05:36Z2014-06-09T18:05:36Z2014-04-09Thesishttp://hdl.handle.net/1853/51967en_US
collection NDLTD
language en_US
sources NDLTD
topic Data-intensive query processing
RA operators
OpenCL
GPUs
CPUs
Graphics processing units
Data warehousing
Big data
Relation algebras
spellingShingle Data-intensive query processing
RA operators
OpenCL
GPUs
CPUs
Graphics processing units
Data warehousing
Big data
Relation algebras
Saeed, Ifrah
A portable relational algebra library for high performance data-intensive query processing
description A growing number of industries are turning to data warehousing applications such as forecasting and risk assessment to process large volumes of data. These data warehousing applications, which utilize queries comprised of a mix of arithmetic and relational algebra (RA) operators, currently run on systems that utilize commodity multi-core CPUs. If we acknowledge the data-intensive nature of these applications, general purpose graphics processing units (GPUs) with high throughput and memory bandwidth seem to be natural candidates to host these applications. However, since such relational queries exhibit irregular parallelism and data accesses, their efficient implementation on GPUs remains challenging. Thus, although tailored solutions for individual processors using their native programming environments have evolved, these solutions are not accessible to other processors. This thesis addresses this problem by providing a portable implementation of RA, mathematical, and related primitives required to implement and accelerate relational queries over large data sets in the form of the library. These primitives can run on any modern multi- and many-core architecture that supports OpenCL, thereby enhancing the performance potential of such architectures for warehousing applications. In essence, this thesis describes the implementation of primitives and the results of their performance evaluation on a range of platforms and concludes with insights, the identification of opportunities, and lessons learned. One of the major insights from our analysis is that for complex relational queries, the time taken to transfer data between host CPUs and discrete GPUs can render the performance of discrete and integrated GPUs comparable in spite of the higher computing power and memory bandwidth of discrete GPUs. Therefore, data movement optimization is the key to eff ectively harnessing the high performance of discrete GPUs; otherwise, cost eff ectiveness would encourage the use of integrated GPUs. Furthermore, portability also enables the complete utilization of all GPUs and CPUs in the system at run time by opportunistically using any type of available processor when a kernel is ready for execution.
author2 Yalamanchili, Sudhakar
author_facet Yalamanchili, Sudhakar
Saeed, Ifrah
author Saeed, Ifrah
author_sort Saeed, Ifrah
title A portable relational algebra library for high performance data-intensive query processing
title_short A portable relational algebra library for high performance data-intensive query processing
title_full A portable relational algebra library for high performance data-intensive query processing
title_fullStr A portable relational algebra library for high performance data-intensive query processing
title_full_unstemmed A portable relational algebra library for high performance data-intensive query processing
title_sort portable relational algebra library for high performance data-intensive query processing
publisher Georgia Institute of Technology
publishDate 2014
url http://hdl.handle.net/1853/51967
work_keys_str_mv AT saeedifrah aportablerelationalalgebralibraryforhighperformancedataintensivequeryprocessing
AT saeedifrah portablerelationalalgebralibraryforhighperformancedataintensivequeryprocessing
_version_ 1716714485724479488