Performance optimization research based on Hive

This paper research Hive performance optimization mainly from the two aspects of MapReduce scheduling and Hive performance tuning.MapReduce's programming model and its implementation process is analyzed,and parameters are tuned from the map side and reduce side.Then Hive's framework is res...

Full description

Bibliographic Details
Main Authors: Wang Kang, Chen Haiguang, Li Dongjing
Format: Article
Language:English
Published: Academic Journals Center of Shanghai Normal University 2017-08-01
Series:Journal of Shanghai Normal University (Natural Sciences)
Subjects:
Online Access:http://qktg.shnu.edu.cn/zrb/shsfqkszrb/ch/reader/view_abstract.aspx?file_no=20170411&flag=1
id doaj-d830c4c94e08496792b26df0cb661410
record_format Article
spelling doaj-d830c4c94e08496792b26df0cb6614102020-11-25T00:20:55ZengAcademic Journals Center of Shanghai Normal UniversityJournal of Shanghai Normal University (Natural Sciences)1000-51371000-51372017-08-0146452753410.3969/J.ISSN.1000-5137.2017.04.01120170411Performance optimization research based on HiveWang Kang0Chen Haiguang1Li Dongjing2The College of Information, Mechanical and Electrical Engineering, Shanghai Normal UniversityThe College of Information, Mechanical and Electrical Engineering, Shanghai Normal UniversityCollege of Computer Science and Technology, Nanjing University of Aeronautics and AstronauticsThis paper research Hive performance optimization mainly from the two aspects of MapReduce scheduling and Hive performance tuning.MapReduce's programming model and its implementation process is analyzed,and parameters are tuned from the map side and reduce side.Then Hive's framework is researched from the aspects of the partition table,the external surface and common data file compression,the line storage and column type storage.The experimental results show that snappy compression and orcfile/parquet storage format can improve the efficiency of query for the column type queries, and has good compatibility.http://qktg.shnu.edu.cn/zrb/shsfqkszrb/ch/reader/view_abstract.aspx?file_no=20170411&flag=1data warehousejob optimizationperformance optimizationcompressionstorage format
collection DOAJ
language English
format Article
sources DOAJ
author Wang Kang
Chen Haiguang
Li Dongjing
spellingShingle Wang Kang
Chen Haiguang
Li Dongjing
Performance optimization research based on Hive
Journal of Shanghai Normal University (Natural Sciences)
data warehouse
job optimization
performance optimization
compression
storage format
author_facet Wang Kang
Chen Haiguang
Li Dongjing
author_sort Wang Kang
title Performance optimization research based on Hive
title_short Performance optimization research based on Hive
title_full Performance optimization research based on Hive
title_fullStr Performance optimization research based on Hive
title_full_unstemmed Performance optimization research based on Hive
title_sort performance optimization research based on hive
publisher Academic Journals Center of Shanghai Normal University
series Journal of Shanghai Normal University (Natural Sciences)
issn 1000-5137
1000-5137
publishDate 2017-08-01
description This paper research Hive performance optimization mainly from the two aspects of MapReduce scheduling and Hive performance tuning.MapReduce's programming model and its implementation process is analyzed,and parameters are tuned from the map side and reduce side.Then Hive's framework is researched from the aspects of the partition table,the external surface and common data file compression,the line storage and column type storage.The experimental results show that snappy compression and orcfile/parquet storage format can improve the efficiency of query for the column type queries, and has good compatibility.
topic data warehouse
job optimization
performance optimization
compression
storage format
url http://qktg.shnu.edu.cn/zrb/shsfqkszrb/ch/reader/view_abstract.aspx?file_no=20170411&flag=1
work_keys_str_mv AT wangkang performanceoptimizationresearchbasedonhive
AT chenhaiguang performanceoptimizationresearchbasedonhive
AT lidongjing performanceoptimizationresearchbasedonhive
_version_ 1725364963673899008