Ordinal Optimization-Based Performance Model Estimation Method for HDFS
Modeling and analyzing the performance of distributed file systems (DFSs) benefit the reliability and quality of data processing in data-intensive applications. Hadoop Distributed File System (HDFS) is a typical representative of DFSs. Its internal heterogeneity and complexity as well as external di...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2020-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8943962/ |
id |
doaj-3c72dd054f9d43fbaed6e1f11948590a |
---|---|
record_format |
Article |
spelling |
doaj-3c72dd054f9d43fbaed6e1f11948590a2021-03-30T02:48:14ZengIEEEIEEE Access2169-35362020-01-01888989910.1109/ACCESS.2019.29627248943962Ordinal Optimization-Based Performance Model Estimation Method for HDFSTian Ma0https://orcid.org/0000-0002-1580-7363Feng Tian1https://orcid.org/0000-0001-7888-0587Bo Dong2https://orcid.org/0000-0001-7695-9072Department of Automation Science and Technology, Xi’an Jiaotong University, Xi’an, ChinaDepartment of Automation Science and Technology, Xi’an Jiaotong University, Xi’an, ChinaNational Engineering Laboratory for Big Data Analytics, Xi’an Jiaotong University, Xi’an, ChinaModeling and analyzing the performance of distributed file systems (DFSs) benefit the reliability and quality of data processing in data-intensive applications. Hadoop Distributed File System (HDFS) is a typical representative of DFSs. Its internal heterogeneity and complexity as well as external disturbance contribute to HDFS's built-in features of nonlinearity as well as randomness in system level, which raises a great challenge in modeling these features. Particularly, the randomness results in the uncertainty of HDFS performance model. Due to the complex mathematical structure and parameters hardly estimated of analytical models, it is highly complicated and computationally impossible to build an explicit and precise analytical model of the randomness. The measurement-based methodology is a promising way to model HDFS performance in terms of randomness since it requires no knowledge of system's internal behaviors. In this paper, the estimation of HDFS performance models on account of the randomness is transformed to an optimization problem of finding out the real best design of performance model structure with large design space. Core ideas of ordinal optimization (OO) are introduced to solve this problem with a limited computing budget. Piecewise linear (PL) model is applied to approximate the nonlinear characteristics and randomness of HDFS performance. The experimental results show that the proposed method is effective and practical to estimate the optimal design of the PL-based performance model structure for HDFS. It not only provides a globally consistent evaluation of the design space but also guarantees the goodness of the solution with high probability. Moreover, it improves the accuracy of system model-based HDFS performance models.https://ieeexplore.ieee.org/document/8943962/Distributed file systemHDFSperformance modelingrandomnessordinal optimization |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Tian Ma Feng Tian Bo Dong |
spellingShingle |
Tian Ma Feng Tian Bo Dong Ordinal Optimization-Based Performance Model Estimation Method for HDFS IEEE Access Distributed file system HDFS performance modeling randomness ordinal optimization |
author_facet |
Tian Ma Feng Tian Bo Dong |
author_sort |
Tian Ma |
title |
Ordinal Optimization-Based Performance Model Estimation Method for HDFS |
title_short |
Ordinal Optimization-Based Performance Model Estimation Method for HDFS |
title_full |
Ordinal Optimization-Based Performance Model Estimation Method for HDFS |
title_fullStr |
Ordinal Optimization-Based Performance Model Estimation Method for HDFS |
title_full_unstemmed |
Ordinal Optimization-Based Performance Model Estimation Method for HDFS |
title_sort |
ordinal optimization-based performance model estimation method for hdfs |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2020-01-01 |
description |
Modeling and analyzing the performance of distributed file systems (DFSs) benefit the reliability and quality of data processing in data-intensive applications. Hadoop Distributed File System (HDFS) is a typical representative of DFSs. Its internal heterogeneity and complexity as well as external disturbance contribute to HDFS's built-in features of nonlinearity as well as randomness in system level, which raises a great challenge in modeling these features. Particularly, the randomness results in the uncertainty of HDFS performance model. Due to the complex mathematical structure and parameters hardly estimated of analytical models, it is highly complicated and computationally impossible to build an explicit and precise analytical model of the randomness. The measurement-based methodology is a promising way to model HDFS performance in terms of randomness since it requires no knowledge of system's internal behaviors. In this paper, the estimation of HDFS performance models on account of the randomness is transformed to an optimization problem of finding out the real best design of performance model structure with large design space. Core ideas of ordinal optimization (OO) are introduced to solve this problem with a limited computing budget. Piecewise linear (PL) model is applied to approximate the nonlinear characteristics and randomness of HDFS performance. The experimental results show that the proposed method is effective and practical to estimate the optimal design of the PL-based performance model structure for HDFS. It not only provides a globally consistent evaluation of the design space but also guarantees the goodness of the solution with high probability. Moreover, it improves the accuracy of system model-based HDFS performance models. |
topic |
Distributed file system HDFS performance modeling randomness ordinal optimization |
url |
https://ieeexplore.ieee.org/document/8943962/ |
work_keys_str_mv |
AT tianma ordinaloptimizationbasedperformancemodelestimationmethodforhdfs AT fengtian ordinaloptimizationbasedperformancemodelestimationmethodforhdfs AT bodong ordinaloptimizationbasedperformancemodelestimationmethodforhdfs |
_version_ |
1724184518367117312 |