Filter-Based Factor Selection Methods in Partial Least Squares Regression

Factor discovery of high-dimensional data is a crucial problem and extremely challenging from a scientific viewpoint with enormous applications in research studies. In this study, the main focus is to introduce the improved subset factor selection method and hence, 9 subset selection methods for par...

Full description

Bibliographic Details
Main Authors: Tahir Mehmood, Maryam Sadiq, Muhammad Aslam
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8878103/
id doaj-39c3768470554a24a5349c561764523c
record_format Article
spelling doaj-39c3768470554a24a5349c561764523c2021-03-30T00:52:16ZengIEEEIEEE Access2169-35362019-01-01715349915350810.1109/ACCESS.2019.29487828878103Filter-Based Factor Selection Methods in Partial Least Squares RegressionTahir Mehmood0https://orcid.org/0000-0001-9775-8093Maryam Sadiq1https://orcid.org/0000-0002-6994-8970Muhammad Aslam2School of Natural Sciences (SNS), National University of Sciences and Technology (NUST), Islamabad, PakistanDepartment of Statistics, University of Azad Jammu and Kashmir, Muzaffarabad, PakistanDepartment of Mathematics and Statistics, Riphah International University, Islamabad, PakistanFactor discovery of high-dimensional data is a crucial problem and extremely challenging from a scientific viewpoint with enormous applications in research studies. In this study, the main focus is to introduce the improved subset factor selection method and hence, 9 subset selection methods for partial least squares regression (PLSR) based on filter factor subset selection approach are proposed. Existing and proposed methods are compared in terms of accuracy, sensitivity, F1 score and number of selected factors over the simulated data set. Further, these methods are practiced on a real data set of nutritional status of children obtained from Pakistan Demographic and Health Survey (PDHS) by addressing performance using a Monte Carlo algorithm. The optimal method is implemented to assess the important factors of nutritional status of children. Dispersion importance (DIMP) factor selection index for PLSR is observed to be a more efficient method regarding accuracy and number of selected factors. The recommended factors contain key information for the nutritional status of children and could be useful in related research.https://ieeexplore.ieee.org/document/8878103/Factor selectionfilterpartial least squaresregression
collection DOAJ
language English
format Article
sources DOAJ
author Tahir Mehmood
Maryam Sadiq
Muhammad Aslam
spellingShingle Tahir Mehmood
Maryam Sadiq
Muhammad Aslam
Filter-Based Factor Selection Methods in Partial Least Squares Regression
IEEE Access
Factor selection
filter
partial least squares
regression
author_facet Tahir Mehmood
Maryam Sadiq
Muhammad Aslam
author_sort Tahir Mehmood
title Filter-Based Factor Selection Methods in Partial Least Squares Regression
title_short Filter-Based Factor Selection Methods in Partial Least Squares Regression
title_full Filter-Based Factor Selection Methods in Partial Least Squares Regression
title_fullStr Filter-Based Factor Selection Methods in Partial Least Squares Regression
title_full_unstemmed Filter-Based Factor Selection Methods in Partial Least Squares Regression
title_sort filter-based factor selection methods in partial least squares regression
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2019-01-01
description Factor discovery of high-dimensional data is a crucial problem and extremely challenging from a scientific viewpoint with enormous applications in research studies. In this study, the main focus is to introduce the improved subset factor selection method and hence, 9 subset selection methods for partial least squares regression (PLSR) based on filter factor subset selection approach are proposed. Existing and proposed methods are compared in terms of accuracy, sensitivity, F1 score and number of selected factors over the simulated data set. Further, these methods are practiced on a real data set of nutritional status of children obtained from Pakistan Demographic and Health Survey (PDHS) by addressing performance using a Monte Carlo algorithm. The optimal method is implemented to assess the important factors of nutritional status of children. Dispersion importance (DIMP) factor selection index for PLSR is observed to be a more efficient method regarding accuracy and number of selected factors. The recommended factors contain key information for the nutritional status of children and could be useful in related research.
topic Factor selection
filter
partial least squares
regression
url https://ieeexplore.ieee.org/document/8878103/
work_keys_str_mv AT tahirmehmood filterbasedfactorselectionmethodsinpartialleastsquaresregression
AT maryamsadiq filterbasedfactorselectionmethodsinpartialleastsquaresregression
AT muhammadaslam filterbasedfactorselectionmethodsinpartialleastsquaresregression
_version_ 1724187679773425664