Automatic Scaling Hadoop in the Cloud for Efficient Process of Big Geospatial Data

Efficient processing of big geospatial data is crucial for tackling global and regional challenges such as climate change and natural disasters, but it is challenging not only due to the massive data volume but also due to the intrinsic complexity and high dimensions of the geospatial datasets. Whil...

Full description

Bibliographic Details
Main Authors:	Zhenlong Li, Chaowei Yang, Kai Liu, Fei Hu, Baoxuan Jin
Format:	Article
Language:	English
Published:	MDPI AG 2016-09-01
Series:	ISPRS International Journal of Geo-Information
Subjects:	geoprocessing cloud computing big data geospatial cyberinfrastructure Hadoop
Online Access:	http://www.mdpi.com/2220-9964/5/10/173

id	doaj-8904e4e3cce9405e913fe54f6b9b5a8b
record_format	Article
spelling	doaj-8904e4e3cce9405e913fe54f6b9b5a8b2020-11-24T23:48:49ZengMDPI AGISPRS International Journal of Geo-Information2220-99642016-09-0151017310.3390/ijgi5100173ijgi5100173Automatic Scaling Hadoop in the Cloud for Efficient Process of Big Geospatial DataZhenlong Li0Chaowei Yang1Kai Liu2Fei Hu3Baoxuan Jin4Department of Geography, University of South Carolina, Columbia, SC 29208, USASpatiotemporal Innovation Center, George Mason University, Fairfax, VA 22030, USASpatiotemporal Innovation Center, George Mason University, Fairfax, VA 22030, USASpatiotemporal Innovation Center, George Mason University, Fairfax, VA 22030, USAYunnan Provincial Geomatics Center, Kunming 650034, ChinaEfficient processing of big geospatial data is crucial for tackling global and regional challenges such as climate change and natural disasters, but it is challenging not only due to the massive data volume but also due to the intrinsic complexity and high dimensions of the geospatial datasets. While traditional computing infrastructure does not scale well with the rapidly increasing data volume, Hadoop has attracted increasing attention in geoscience communities for handling big geospatial data. Recently, many studies were carried out to investigate adopting Hadoop for processing big geospatial data, but how to adjust the computing resources to efficiently handle the dynamic geoprocessing workload was barely explored. To bridge this gap, we propose a novel framework to automatically scale the Hadoop cluster in the cloud environment to allocate the right amount of computing resources based on the dynamic geoprocessing workload. The framework and auto-scaling algorithms are introduced, and a prototype system was developed to demonstrate the feasibility and efficiency of the proposed scaling mechanism using Digital Elevation Model (DEM) interpolation as an example. Experimental results show that this auto-scaling framework could (1) significantly reduce the computing resource utilization (by 80% in our example) while delivering similar performance as a full-powered cluster; and (2) effectively handle the spike processing workload by automatically increasing the computing resources to ensure the processing is finished within an acceptable time. Such an auto-scaling approach provides a valuable reference to optimize the performance of geospatial applications to address data- and computational-intensity challenges in GIScience in a more cost-efficient manner.http://www.mdpi.com/2220-9964/5/10/173geoprocessingcloud computingbig datageospatial cyberinfrastructureHadoop
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Zhenlong Li Chaowei Yang Kai Liu Fei Hu Baoxuan Jin
spellingShingle	Zhenlong Li Chaowei Yang Kai Liu Fei Hu Baoxuan Jin Automatic Scaling Hadoop in the Cloud for Efficient Process of Big Geospatial Data ISPRS International Journal of Geo-Information geoprocessing cloud computing big data geospatial cyberinfrastructure Hadoop
author_facet	Zhenlong Li Chaowei Yang Kai Liu Fei Hu Baoxuan Jin
author_sort	Zhenlong Li
title	Automatic Scaling Hadoop in the Cloud for Efficient Process of Big Geospatial Data
title_short	Automatic Scaling Hadoop in the Cloud for Efficient Process of Big Geospatial Data
title_full	Automatic Scaling Hadoop in the Cloud for Efficient Process of Big Geospatial Data
title_fullStr	Automatic Scaling Hadoop in the Cloud for Efficient Process of Big Geospatial Data
title_full_unstemmed	Automatic Scaling Hadoop in the Cloud for Efficient Process of Big Geospatial Data
title_sort	automatic scaling hadoop in the cloud for efficient process of big geospatial data
publisher	MDPI AG
series	ISPRS International Journal of Geo-Information
issn	2220-9964
publishDate	2016-09-01
description	Efficient processing of big geospatial data is crucial for tackling global and regional challenges such as climate change and natural disasters, but it is challenging not only due to the massive data volume but also due to the intrinsic complexity and high dimensions of the geospatial datasets. While traditional computing infrastructure does not scale well with the rapidly increasing data volume, Hadoop has attracted increasing attention in geoscience communities for handling big geospatial data. Recently, many studies were carried out to investigate adopting Hadoop for processing big geospatial data, but how to adjust the computing resources to efficiently handle the dynamic geoprocessing workload was barely explored. To bridge this gap, we propose a novel framework to automatically scale the Hadoop cluster in the cloud environment to allocate the right amount of computing resources based on the dynamic geoprocessing workload. The framework and auto-scaling algorithms are introduced, and a prototype system was developed to demonstrate the feasibility and efficiency of the proposed scaling mechanism using Digital Elevation Model (DEM) interpolation as an example. Experimental results show that this auto-scaling framework could (1) significantly reduce the computing resource utilization (by 80% in our example) while delivering similar performance as a full-powered cluster; and (2) effectively handle the spike processing workload by automatically increasing the computing resources to ensure the processing is finished within an acceptable time. Such an auto-scaling approach provides a valuable reference to optimize the performance of geospatial applications to address data- and computational-intensity challenges in GIScience in a more cost-efficient manner.
topic	geoprocessing cloud computing big data geospatial cyberinfrastructure Hadoop
url	http://www.mdpi.com/2220-9964/5/10/173
work_keys_str_mv	AT zhenlongli automaticscalinghadoopinthecloudforefficientprocessofbiggeospatialdata AT chaoweiyang automaticscalinghadoopinthecloudforefficientprocessofbiggeospatialdata AT kailiu automaticscalinghadoopinthecloudforefficientprocessofbiggeospatialdata AT feihu automaticscalinghadoopinthecloudforefficientprocessofbiggeospatialdata AT baoxuanjin automaticscalinghadoopinthecloudforefficientprocessofbiggeospatialdata
_version_	1725484356858806272

Automatic Scaling Hadoop in the Cloud for Efficient Process of Big Geospatial Data

Similar Items