Majority scoring with backward elimination in PLS for high dimensional spectrum data

Abstract Variable selection is crucial issue for high dimensional data modeling, where sample size is smaller compared to number of variables. Recently, majority scoring of filter measures in PLS (MS-PLS) is introduced for variable selection in high dimensional data. Filter measures are not greedy f...

Full description

Bibliographic Details
Main Author: Freeh N. Alenezi
Format: Article
Language:English
Published: Nature Publishing Group 2021-08-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-021-96389-2
id doaj-dc55cfc4d44f4cbdb4a9bee95682efe4
record_format Article
spelling doaj-dc55cfc4d44f4cbdb4a9bee95682efe42021-08-22T11:26:51ZengNature Publishing GroupScientific Reports2045-23222021-08-0111111110.1038/s41598-021-96389-2Majority scoring with backward elimination in PLS for high dimensional spectrum dataFreeh N. Alenezi0Mathematics Department, College of Science in Zulfi, Majmaah UniversityAbstract Variable selection is crucial issue for high dimensional data modeling, where sample size is smaller compared to number of variables. Recently, majority scoring of filter measures in PLS (MS-PLS) is introduced for variable selection in high dimensional data. Filter measures are not greedy for optimal performance, hence we have proposed majority scoring with backward elimination in PLS (MSBE-PLS). In MSBE-PLS we have considered variable importance on projection (VIP) and selectivity ratio (SR). In each iteration of backward elimination in PLS variables are considered influential if they were selected by both filter indicator. The proposed method is implemented for corn’s and diesel’s content prediction. The corn contents include protein, oil, starch and moisture while diesel contents include boiling point at 50% recovery, cetane number, density, freezing temperature of the fuel, total aromatics, and viscosity. The proposed method outperforms in terms of RMSE when compared with reference methods. In addition to validating the spectrum models, data properties are also examined for explaining prediction behaviors. Moreover, MSBE-PLS select the moderate number of influential variables, hence it presents the parsimonious model for predicting contents based on spectrum data.https://doi.org/10.1038/s41598-021-96389-2
collection DOAJ
language English
format Article
sources DOAJ
author Freeh N. Alenezi
spellingShingle Freeh N. Alenezi
Majority scoring with backward elimination in PLS for high dimensional spectrum data
Scientific Reports
author_facet Freeh N. Alenezi
author_sort Freeh N. Alenezi
title Majority scoring with backward elimination in PLS for high dimensional spectrum data
title_short Majority scoring with backward elimination in PLS for high dimensional spectrum data
title_full Majority scoring with backward elimination in PLS for high dimensional spectrum data
title_fullStr Majority scoring with backward elimination in PLS for high dimensional spectrum data
title_full_unstemmed Majority scoring with backward elimination in PLS for high dimensional spectrum data
title_sort majority scoring with backward elimination in pls for high dimensional spectrum data
publisher Nature Publishing Group
series Scientific Reports
issn 2045-2322
publishDate 2021-08-01
description Abstract Variable selection is crucial issue for high dimensional data modeling, where sample size is smaller compared to number of variables. Recently, majority scoring of filter measures in PLS (MS-PLS) is introduced for variable selection in high dimensional data. Filter measures are not greedy for optimal performance, hence we have proposed majority scoring with backward elimination in PLS (MSBE-PLS). In MSBE-PLS we have considered variable importance on projection (VIP) and selectivity ratio (SR). In each iteration of backward elimination in PLS variables are considered influential if they were selected by both filter indicator. The proposed method is implemented for corn’s and diesel’s content prediction. The corn contents include protein, oil, starch and moisture while diesel contents include boiling point at 50% recovery, cetane number, density, freezing temperature of the fuel, total aromatics, and viscosity. The proposed method outperforms in terms of RMSE when compared with reference methods. In addition to validating the spectrum models, data properties are also examined for explaining prediction behaviors. Moreover, MSBE-PLS select the moderate number of influential variables, hence it presents the parsimonious model for predicting contents based on spectrum data.
url https://doi.org/10.1038/s41598-021-96389-2
work_keys_str_mv AT freehnalenezi majorityscoringwithbackwardeliminationinplsforhighdimensionalspectrumdata
_version_ 1721199810422243328