Automatic Scaling Hadoop in the Cloud for Efficient Process of Big Geospatial Data
Efficient processing of big geospatial data is crucial for tackling global and regional challenges such as climate change and natural disasters, but it is challenging not only due to the massive data volume but also due to the intrinsic complexity and high dimensions of the geospatial datasets. Whil...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2016-09-01
|
Series: | ISPRS International Journal of Geo-Information |
Subjects: | |
Online Access: | http://www.mdpi.com/2220-9964/5/10/173 |
id |
doaj-8904e4e3cce9405e913fe54f6b9b5a8b |
---|---|
record_format |
Article |
spelling |
doaj-8904e4e3cce9405e913fe54f6b9b5a8b2020-11-24T23:48:49ZengMDPI AGISPRS International Journal of Geo-Information2220-99642016-09-0151017310.3390/ijgi5100173ijgi5100173Automatic Scaling Hadoop in the Cloud for Efficient Process of Big Geospatial DataZhenlong Li0Chaowei Yang1Kai Liu2Fei Hu3Baoxuan Jin4Department of Geography, University of South Carolina, Columbia, SC 29208, USASpatiotemporal Innovation Center, George Mason University, Fairfax, VA 22030, USASpatiotemporal Innovation Center, George Mason University, Fairfax, VA 22030, USASpatiotemporal Innovation Center, George Mason University, Fairfax, VA 22030, USAYunnan Provincial Geomatics Center, Kunming 650034, ChinaEfficient processing of big geospatial data is crucial for tackling global and regional challenges such as climate change and natural disasters, but it is challenging not only due to the massive data volume but also due to the intrinsic complexity and high dimensions of the geospatial datasets. While traditional computing infrastructure does not scale well with the rapidly increasing data volume, Hadoop has attracted increasing attention in geoscience communities for handling big geospatial data. Recently, many studies were carried out to investigate adopting Hadoop for processing big geospatial data, but how to adjust the computing resources to efficiently handle the dynamic geoprocessing workload was barely explored. To bridge this gap, we propose a novel framework to automatically scale the Hadoop cluster in the cloud environment to allocate the right amount of computing resources based on the dynamic geoprocessing workload. The framework and auto-scaling algorithms are introduced, and a prototype system was developed to demonstrate the feasibility and efficiency of the proposed scaling mechanism using Digital Elevation Model (DEM) interpolation as an example. Experimental results show that this auto-scaling framework could (1) significantly reduce the computing resource utilization (by 80% in our example) while delivering similar performance as a full-powered cluster; and (2) effectively handle the spike processing workload by automatically increasing the computing resources to ensure the processing is finished within an acceptable time. Such an auto-scaling approach provides a valuable reference to optimize the performance of geospatial applications to address data- and computational-intensity challenges in GIScience in a more cost-efficient manner.http://www.mdpi.com/2220-9964/5/10/173geoprocessingcloud computingbig datageospatial cyberinfrastructureHadoop |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Zhenlong Li Chaowei Yang Kai Liu Fei Hu Baoxuan Jin |
spellingShingle |
Zhenlong Li Chaowei Yang Kai Liu Fei Hu Baoxuan Jin Automatic Scaling Hadoop in the Cloud for Efficient Process of Big Geospatial Data ISPRS International Journal of Geo-Information geoprocessing cloud computing big data geospatial cyberinfrastructure Hadoop |
author_facet |
Zhenlong Li Chaowei Yang Kai Liu Fei Hu Baoxuan Jin |
author_sort |
Zhenlong Li |
title |
Automatic Scaling Hadoop in the Cloud for Efficient Process of Big Geospatial Data |
title_short |
Automatic Scaling Hadoop in the Cloud for Efficient Process of Big Geospatial Data |
title_full |
Automatic Scaling Hadoop in the Cloud for Efficient Process of Big Geospatial Data |
title_fullStr |
Automatic Scaling Hadoop in the Cloud for Efficient Process of Big Geospatial Data |
title_full_unstemmed |
Automatic Scaling Hadoop in the Cloud for Efficient Process of Big Geospatial Data |
title_sort |
automatic scaling hadoop in the cloud for efficient process of big geospatial data |
publisher |
MDPI AG |
series |
ISPRS International Journal of Geo-Information |
issn |
2220-9964 |
publishDate |
2016-09-01 |
description |
Efficient processing of big geospatial data is crucial for tackling global and regional challenges such as climate change and natural disasters, but it is challenging not only due to the massive data volume but also due to the intrinsic complexity and high dimensions of the geospatial datasets. While traditional computing infrastructure does not scale well with the rapidly increasing data volume, Hadoop has attracted increasing attention in geoscience communities for handling big geospatial data. Recently, many studies were carried out to investigate adopting Hadoop for processing big geospatial data, but how to adjust the computing resources to efficiently handle the dynamic geoprocessing workload was barely explored. To bridge this gap, we propose a novel framework to automatically scale the Hadoop cluster in the cloud environment to allocate the right amount of computing resources based on the dynamic geoprocessing workload. The framework and auto-scaling algorithms are introduced, and a prototype system was developed to demonstrate the feasibility and efficiency of the proposed scaling mechanism using Digital Elevation Model (DEM) interpolation as an example. Experimental results show that this auto-scaling framework could (1) significantly reduce the computing resource utilization (by 80% in our example) while delivering similar performance as a full-powered cluster; and (2) effectively handle the spike processing workload by automatically increasing the computing resources to ensure the processing is finished within an acceptable time. Such an auto-scaling approach provides a valuable reference to optimize the performance of geospatial applications to address data- and computational-intensity challenges in GIScience in a more cost-efficient manner. |
topic |
geoprocessing cloud computing big data geospatial cyberinfrastructure Hadoop |
url |
http://www.mdpi.com/2220-9964/5/10/173 |
work_keys_str_mv |
AT zhenlongli automaticscalinghadoopinthecloudforefficientprocessofbiggeospatialdata AT chaoweiyang automaticscalinghadoopinthecloudforefficientprocessofbiggeospatialdata AT kailiu automaticscalinghadoopinthecloudforefficientprocessofbiggeospatialdata AT feihu automaticscalinghadoopinthecloudforefficientprocessofbiggeospatialdata AT baoxuanjin automaticscalinghadoopinthecloudforefficientprocessofbiggeospatialdata |
_version_ |
1725484356858806272 |