Performance optimization research based on Hive
This paper research Hive performance optimization mainly from the two aspects of MapReduce scheduling and Hive performance tuning.MapReduce's programming model and its implementation process is analyzed,and parameters are tuned from the map side and reduce side.Then Hive's framework is res...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Academic Journals Center of Shanghai Normal University
2017-08-01
|
Series: | Journal of Shanghai Normal University (Natural Sciences) |
Subjects: | |
Online Access: | http://qktg.shnu.edu.cn/zrb/shsfqkszrb/ch/reader/view_abstract.aspx?file_no=20170411&flag=1 |
id |
doaj-d830c4c94e08496792b26df0cb661410 |
---|---|
record_format |
Article |
spelling |
doaj-d830c4c94e08496792b26df0cb6614102020-11-25T00:20:55ZengAcademic Journals Center of Shanghai Normal UniversityJournal of Shanghai Normal University (Natural Sciences)1000-51371000-51372017-08-0146452753410.3969/J.ISSN.1000-5137.2017.04.01120170411Performance optimization research based on HiveWang Kang0Chen Haiguang1Li Dongjing2The College of Information, Mechanical and Electrical Engineering, Shanghai Normal UniversityThe College of Information, Mechanical and Electrical Engineering, Shanghai Normal UniversityCollege of Computer Science and Technology, Nanjing University of Aeronautics and AstronauticsThis paper research Hive performance optimization mainly from the two aspects of MapReduce scheduling and Hive performance tuning.MapReduce's programming model and its implementation process is analyzed,and parameters are tuned from the map side and reduce side.Then Hive's framework is researched from the aspects of the partition table,the external surface and common data file compression,the line storage and column type storage.The experimental results show that snappy compression and orcfile/parquet storage format can improve the efficiency of query for the column type queries, and has good compatibility.http://qktg.shnu.edu.cn/zrb/shsfqkszrb/ch/reader/view_abstract.aspx?file_no=20170411&flag=1data warehousejob optimizationperformance optimizationcompressionstorage format |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Wang Kang Chen Haiguang Li Dongjing |
spellingShingle |
Wang Kang Chen Haiguang Li Dongjing Performance optimization research based on Hive Journal of Shanghai Normal University (Natural Sciences) data warehouse job optimization performance optimization compression storage format |
author_facet |
Wang Kang Chen Haiguang Li Dongjing |
author_sort |
Wang Kang |
title |
Performance optimization research based on Hive |
title_short |
Performance optimization research based on Hive |
title_full |
Performance optimization research based on Hive |
title_fullStr |
Performance optimization research based on Hive |
title_full_unstemmed |
Performance optimization research based on Hive |
title_sort |
performance optimization research based on hive |
publisher |
Academic Journals Center of Shanghai Normal University |
series |
Journal of Shanghai Normal University (Natural Sciences) |
issn |
1000-5137 1000-5137 |
publishDate |
2017-08-01 |
description |
This paper research Hive performance optimization mainly from the two aspects of MapReduce scheduling and Hive performance tuning.MapReduce's programming model and its implementation process is analyzed,and parameters are tuned from the map side and reduce side.Then Hive's framework is researched from the aspects of the partition table,the external surface and common data file compression,the line storage and column type storage.The experimental results show that snappy compression and orcfile/parquet storage format can improve the efficiency of query for the column type queries, and has good compatibility. |
topic |
data warehouse job optimization performance optimization compression storage format |
url |
http://qktg.shnu.edu.cn/zrb/shsfqkszrb/ch/reader/view_abstract.aspx?file_no=20170411&flag=1 |
work_keys_str_mv |
AT wangkang performanceoptimizationresearchbasedonhive AT chenhaiguang performanceoptimizationresearchbasedonhive AT lidongjing performanceoptimizationresearchbasedonhive |
_version_ |
1725364963673899008 |