Ordinal Optimization-Based Performance Model Estimation Method for HDFS

Modeling and analyzing the performance of distributed file systems (DFSs) benefit the reliability and quality of data processing in data-intensive applications. Hadoop Distributed File System (HDFS) is a typical representative of DFSs. Its internal heterogeneity and complexity as well as external di...

Full description

Bibliographic Details
Main Authors: Tian Ma, Feng Tian, Bo Dong
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8943962/
id doaj-3c72dd054f9d43fbaed6e1f11948590a
record_format Article
spelling doaj-3c72dd054f9d43fbaed6e1f11948590a2021-03-30T02:48:14ZengIEEEIEEE Access2169-35362020-01-01888989910.1109/ACCESS.2019.29627248943962Ordinal Optimization-Based Performance Model Estimation Method for HDFSTian Ma0https://orcid.org/0000-0002-1580-7363Feng Tian1https://orcid.org/0000-0001-7888-0587Bo Dong2https://orcid.org/0000-0001-7695-9072Department of Automation Science and Technology, Xi’an Jiaotong University, Xi’an, ChinaDepartment of Automation Science and Technology, Xi’an Jiaotong University, Xi’an, ChinaNational Engineering Laboratory for Big Data Analytics, Xi’an Jiaotong University, Xi’an, ChinaModeling and analyzing the performance of distributed file systems (DFSs) benefit the reliability and quality of data processing in data-intensive applications. Hadoop Distributed File System (HDFS) is a typical representative of DFSs. Its internal heterogeneity and complexity as well as external disturbance contribute to HDFS's built-in features of nonlinearity as well as randomness in system level, which raises a great challenge in modeling these features. Particularly, the randomness results in the uncertainty of HDFS performance model. Due to the complex mathematical structure and parameters hardly estimated of analytical models, it is highly complicated and computationally impossible to build an explicit and precise analytical model of the randomness. The measurement-based methodology is a promising way to model HDFS performance in terms of randomness since it requires no knowledge of system's internal behaviors. In this paper, the estimation of HDFS performance models on account of the randomness is transformed to an optimization problem of finding out the real best design of performance model structure with large design space. Core ideas of ordinal optimization (OO) are introduced to solve this problem with a limited computing budget. Piecewise linear (PL) model is applied to approximate the nonlinear characteristics and randomness of HDFS performance. The experimental results show that the proposed method is effective and practical to estimate the optimal design of the PL-based performance model structure for HDFS. It not only provides a globally consistent evaluation of the design space but also guarantees the goodness of the solution with high probability. Moreover, it improves the accuracy of system model-based HDFS performance models.https://ieeexplore.ieee.org/document/8943962/Distributed file systemHDFSperformance modelingrandomnessordinal optimization
collection DOAJ
language English
format Article
sources DOAJ
author Tian Ma
Feng Tian
Bo Dong
spellingShingle Tian Ma
Feng Tian
Bo Dong
Ordinal Optimization-Based Performance Model Estimation Method for HDFS
IEEE Access
Distributed file system
HDFS
performance modeling
randomness
ordinal optimization
author_facet Tian Ma
Feng Tian
Bo Dong
author_sort Tian Ma
title Ordinal Optimization-Based Performance Model Estimation Method for HDFS
title_short Ordinal Optimization-Based Performance Model Estimation Method for HDFS
title_full Ordinal Optimization-Based Performance Model Estimation Method for HDFS
title_fullStr Ordinal Optimization-Based Performance Model Estimation Method for HDFS
title_full_unstemmed Ordinal Optimization-Based Performance Model Estimation Method for HDFS
title_sort ordinal optimization-based performance model estimation method for hdfs
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2020-01-01
description Modeling and analyzing the performance of distributed file systems (DFSs) benefit the reliability and quality of data processing in data-intensive applications. Hadoop Distributed File System (HDFS) is a typical representative of DFSs. Its internal heterogeneity and complexity as well as external disturbance contribute to HDFS's built-in features of nonlinearity as well as randomness in system level, which raises a great challenge in modeling these features. Particularly, the randomness results in the uncertainty of HDFS performance model. Due to the complex mathematical structure and parameters hardly estimated of analytical models, it is highly complicated and computationally impossible to build an explicit and precise analytical model of the randomness. The measurement-based methodology is a promising way to model HDFS performance in terms of randomness since it requires no knowledge of system's internal behaviors. In this paper, the estimation of HDFS performance models on account of the randomness is transformed to an optimization problem of finding out the real best design of performance model structure with large design space. Core ideas of ordinal optimization (OO) are introduced to solve this problem with a limited computing budget. Piecewise linear (PL) model is applied to approximate the nonlinear characteristics and randomness of HDFS performance. The experimental results show that the proposed method is effective and practical to estimate the optimal design of the PL-based performance model structure for HDFS. It not only provides a globally consistent evaluation of the design space but also guarantees the goodness of the solution with high probability. Moreover, it improves the accuracy of system model-based HDFS performance models.
topic Distributed file system
HDFS
performance modeling
randomness
ordinal optimization
url https://ieeexplore.ieee.org/document/8943962/
work_keys_str_mv AT tianma ordinaloptimizationbasedperformancemodelestimationmethodforhdfs
AT fengtian ordinaloptimizationbasedperformancemodelestimationmethodforhdfs
AT bodong ordinaloptimizationbasedperformancemodelestimationmethodforhdfs
_version_ 1724184518367117312