Filter-Based Factor Selection Methods in Partial Least Squares Regression
Factor discovery of high-dimensional data is a crucial problem and extremely challenging from a scientific viewpoint with enormous applications in research studies. In this study, the main focus is to introduce the improved subset factor selection method and hence, 9 subset selection methods for par...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2019-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8878103/ |
id |
doaj-39c3768470554a24a5349c561764523c |
---|---|
record_format |
Article |
spelling |
doaj-39c3768470554a24a5349c561764523c2021-03-30T00:52:16ZengIEEEIEEE Access2169-35362019-01-01715349915350810.1109/ACCESS.2019.29487828878103Filter-Based Factor Selection Methods in Partial Least Squares RegressionTahir Mehmood0https://orcid.org/0000-0001-9775-8093Maryam Sadiq1https://orcid.org/0000-0002-6994-8970Muhammad Aslam2School of Natural Sciences (SNS), National University of Sciences and Technology (NUST), Islamabad, PakistanDepartment of Statistics, University of Azad Jammu and Kashmir, Muzaffarabad, PakistanDepartment of Mathematics and Statistics, Riphah International University, Islamabad, PakistanFactor discovery of high-dimensional data is a crucial problem and extremely challenging from a scientific viewpoint with enormous applications in research studies. In this study, the main focus is to introduce the improved subset factor selection method and hence, 9 subset selection methods for partial least squares regression (PLSR) based on filter factor subset selection approach are proposed. Existing and proposed methods are compared in terms of accuracy, sensitivity, F1 score and number of selected factors over the simulated data set. Further, these methods are practiced on a real data set of nutritional status of children obtained from Pakistan Demographic and Health Survey (PDHS) by addressing performance using a Monte Carlo algorithm. The optimal method is implemented to assess the important factors of nutritional status of children. Dispersion importance (DIMP) factor selection index for PLSR is observed to be a more efficient method regarding accuracy and number of selected factors. The recommended factors contain key information for the nutritional status of children and could be useful in related research.https://ieeexplore.ieee.org/document/8878103/Factor selectionfilterpartial least squaresregression |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Tahir Mehmood Maryam Sadiq Muhammad Aslam |
spellingShingle |
Tahir Mehmood Maryam Sadiq Muhammad Aslam Filter-Based Factor Selection Methods in Partial Least Squares Regression IEEE Access Factor selection filter partial least squares regression |
author_facet |
Tahir Mehmood Maryam Sadiq Muhammad Aslam |
author_sort |
Tahir Mehmood |
title |
Filter-Based Factor Selection Methods in Partial Least Squares Regression |
title_short |
Filter-Based Factor Selection Methods in Partial Least Squares Regression |
title_full |
Filter-Based Factor Selection Methods in Partial Least Squares Regression |
title_fullStr |
Filter-Based Factor Selection Methods in Partial Least Squares Regression |
title_full_unstemmed |
Filter-Based Factor Selection Methods in Partial Least Squares Regression |
title_sort |
filter-based factor selection methods in partial least squares regression |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2019-01-01 |
description |
Factor discovery of high-dimensional data is a crucial problem and extremely challenging from a scientific viewpoint with enormous applications in research studies. In this study, the main focus is to introduce the improved subset factor selection method and hence, 9 subset selection methods for partial least squares regression (PLSR) based on filter factor subset selection approach are proposed. Existing and proposed methods are compared in terms of accuracy, sensitivity, F1 score and number of selected factors over the simulated data set. Further, these methods are practiced on a real data set of nutritional status of children obtained from Pakistan Demographic and Health Survey (PDHS) by addressing performance using a Monte Carlo algorithm. The optimal method is implemented to assess the important factors of nutritional status of children. Dispersion importance (DIMP) factor selection index for PLSR is observed to be a more efficient method regarding accuracy and number of selected factors. The recommended factors contain key information for the nutritional status of children and could be useful in related research. |
topic |
Factor selection filter partial least squares regression |
url |
https://ieeexplore.ieee.org/document/8878103/ |
work_keys_str_mv |
AT tahirmehmood filterbasedfactorselectionmethodsinpartialleastsquaresregression AT maryamsadiq filterbasedfactorselectionmethodsinpartialleastsquaresregression AT muhammadaslam filterbasedfactorselectionmethodsinpartialleastsquaresregression |
_version_ |
1724187679773425664 |