Majority scoring with backward elimination in PLS for high dimensional spectrum data

Abstract Variable selection is crucial issue for high dimensional data modeling, where sample size is smaller compared to number of variables. Recently, majority scoring of filter measures in PLS (MS-PLS) is introduced for variable selection in high dimensional data. Filter measures are not greedy f...

Full description

Bibliographic Details
Main Author:	Freeh N. Alenezi
Format:	Article
Language:	English
Published:	Nature Publishing Group 2021-08-01
Series:	Scientific Reports
Online Access:	https://doi.org/10.1038/s41598-021-96389-2

id	doaj-dc55cfc4d44f4cbdb4a9bee95682efe4
record_format	Article
spelling	doaj-dc55cfc4d44f4cbdb4a9bee95682efe42021-08-22T11:26:51ZengNature Publishing GroupScientific Reports2045-23222021-08-0111111110.1038/s41598-021-96389-2Majority scoring with backward elimination in PLS for high dimensional spectrum dataFreeh N. Alenezi0Mathematics Department, College of Science in Zulfi, Majmaah UniversityAbstract Variable selection is crucial issue for high dimensional data modeling, where sample size is smaller compared to number of variables. Recently, majority scoring of filter measures in PLS (MS-PLS) is introduced for variable selection in high dimensional data. Filter measures are not greedy for optimal performance, hence we have proposed majority scoring with backward elimination in PLS (MSBE-PLS). In MSBE-PLS we have considered variable importance on projection (VIP) and selectivity ratio (SR). In each iteration of backward elimination in PLS variables are considered influential if they were selected by both filter indicator. The proposed method is implemented for corn’s and diesel’s content prediction. The corn contents include protein, oil, starch and moisture while diesel contents include boiling point at 50% recovery, cetane number, density, freezing temperature of the fuel, total aromatics, and viscosity. The proposed method outperforms in terms of RMSE when compared with reference methods. In addition to validating the spectrum models, data properties are also examined for explaining prediction behaviors. Moreover, MSBE-PLS select the moderate number of influential variables, hence it presents the parsimonious model for predicting contents based on spectrum data.https://doi.org/10.1038/s41598-021-96389-2
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Freeh N. Alenezi
spellingShingle	Freeh N. Alenezi Majority scoring with backward elimination in PLS for high dimensional spectrum data Scientific Reports
author_facet	Freeh N. Alenezi
author_sort	Freeh N. Alenezi
title	Majority scoring with backward elimination in PLS for high dimensional spectrum data
title_short	Majority scoring with backward elimination in PLS for high dimensional spectrum data
title_full	Majority scoring with backward elimination in PLS for high dimensional spectrum data
title_fullStr	Majority scoring with backward elimination in PLS for high dimensional spectrum data
title_full_unstemmed	Majority scoring with backward elimination in PLS for high dimensional spectrum data
title_sort	majority scoring with backward elimination in pls for high dimensional spectrum data
publisher	Nature Publishing Group
series	Scientific Reports
issn	2045-2322
publishDate	2021-08-01
description	Abstract Variable selection is crucial issue for high dimensional data modeling, where sample size is smaller compared to number of variables. Recently, majority scoring of filter measures in PLS (MS-PLS) is introduced for variable selection in high dimensional data. Filter measures are not greedy for optimal performance, hence we have proposed majority scoring with backward elimination in PLS (MSBE-PLS). In MSBE-PLS we have considered variable importance on projection (VIP) and selectivity ratio (SR). In each iteration of backward elimination in PLS variables are considered influential if they were selected by both filter indicator. The proposed method is implemented for corn’s and diesel’s content prediction. The corn contents include protein, oil, starch and moisture while diesel contents include boiling point at 50% recovery, cetane number, density, freezing temperature of the fuel, total aromatics, and viscosity. The proposed method outperforms in terms of RMSE when compared with reference methods. In addition to validating the spectrum models, data properties are also examined for explaining prediction behaviors. Moreover, MSBE-PLS select the moderate number of influential variables, hence it presents the parsimonious model for predicting contents based on spectrum data.
url	https://doi.org/10.1038/s41598-021-96389-2
work_keys_str_mv	AT freehnalenezi majorityscoringwithbackwardeliminationinplsforhighdimensionalspectrumdata
_version_	1721199810422243328

Majority scoring with backward elimination in PLS for high dimensional spectrum data

Similar Items