Robust Wavelength Selection Using Filter-Wrapper Method and Input Scaling on Near Infrared Spectral Data

The extraction of relevant wavelengths from a large dataset of Near Infrared Spectroscopy (NIRS) is a significant challenge in vibrational spectroscopy research. Nonetheless, this process allows the improvement in the chemical interpretability by emphasizing the chemical entities related to the chem...

Full description

Bibliographic Details
Main Authors: Divo Dharma Silalahi, Habshah Midi, Jayanthi Arasan, Mohd Shafie Mustafa, Jean-Pierre Caliman
Format: Article
Language:English
Published: MDPI AG 2020-09-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/20/17/5001
id doaj-84cd00511b4541f6bc510102a61d0d28
record_format Article
spelling doaj-84cd00511b4541f6bc510102a61d0d282020-11-25T02:30:49ZengMDPI AGSensors1424-82202020-09-01205001500110.3390/s20175001Robust Wavelength Selection Using Filter-Wrapper Method and Input Scaling on Near Infrared Spectral DataDivo Dharma Silalahi0Habshah Midi1Jayanthi Arasan2Mohd Shafie Mustafa3Jean-Pierre Caliman4SMART Research Institute, PT. SMART TBK, Pekanbaru 28289, Riau, IndonesiaInstitute for Mathematical Research, Universiti Putra Malaysia (UPM), Serdang 43400, Selangor, MalaysiaInstitute for Mathematical Research, Universiti Putra Malaysia (UPM), Serdang 43400, Selangor, MalaysiaInstitute for Mathematical Research, Universiti Putra Malaysia (UPM), Serdang 43400, Selangor, MalaysiaSMART Research Institute, PT. SMART TBK, Pekanbaru 28289, Riau, IndonesiaThe extraction of relevant wavelengths from a large dataset of Near Infrared Spectroscopy (NIRS) is a significant challenge in vibrational spectroscopy research. Nonetheless, this process allows the improvement in the chemical interpretability by emphasizing the chemical entities related to the chemical parameters of samples. With the complexity in the dataset, it may be possible that irrelevant wavelengths are still included in the multivariate calibration. This yields the computational process to become unnecessary complex and decreases the accuracy and robustness of the model. In multivariate analysis, Partial Least Square Regression (PLSR) is a method commonly used to build a predictive model from NIR spectral data. However, in the PLSR method and common commercial chemometrics software, there is no standard wavelength selection procedure applied to screen the irrelevant wavelengths. In this study, a new robust wavelength selection procedure called the modified VIP-MCUVE (mod-VIP-MCUVE) using Filter-Wrapper method and input scaling strategy is introduced. The proposed method combines the modified Variable Importance in Projection (VIP) and modified Monte Carlo Uninformative Variable Elimination (MCUVE) to calculate the scale matrix of the input variable. The modified VIP uses the orthogonal components of Partial Least Square (PLS) in investigating the informative variable in the model by applying the amount of variation both in <inline-formula><math display="inline"><semantics><mstyle mathvariant="bold" mathsize="normal"><mi>X</mi></mstyle></semantics></math></inline-formula> and <inline-formula><math display="inline"><semantics><mstyle mathvariant="bold" mathsize="normal"><mi>y</mi></mstyle></semantics></math></inline-formula><inline-formula><math display="inline"><semantics><mrow><mrow><mo>{</mo><mrow><mrow><mi>SSX</mi><mo>,</mo></mrow><mi>SSY</mi></mrow><mo>}</mo></mrow></mrow></semantics></math></inline-formula>, simultaneously. The modified MCUVE uses a robust reliability coefficient and a robust tolerance interval in the selection procedure. To evaluate the superiority of the proposed method, the classical VIP, MCUVE, and autoscaling procedure in classical PLSR were also included in the evaluation. Using artificial data with Monte Carlo simulation and NIR spectral data of oil palm (<i>Elaeis guineensis</i> Jacq.) fruit mesocarp, the study shows that the proposed method offers advantages to improve model interpretability, to be computationally extensive, and to produce better model accuracy.https://www.mdpi.com/1424-8220/20/17/5001near infrared spectral datarobust statisticspartial least squaresscalingvariable selectionvariable importance in projection
collection DOAJ
language English
format Article
sources DOAJ
author Divo Dharma Silalahi
Habshah Midi
Jayanthi Arasan
Mohd Shafie Mustafa
Jean-Pierre Caliman
spellingShingle Divo Dharma Silalahi
Habshah Midi
Jayanthi Arasan
Mohd Shafie Mustafa
Jean-Pierre Caliman
Robust Wavelength Selection Using Filter-Wrapper Method and Input Scaling on Near Infrared Spectral Data
Sensors
near infrared spectral data
robust statistics
partial least squares
scaling
variable selection
variable importance in projection
author_facet Divo Dharma Silalahi
Habshah Midi
Jayanthi Arasan
Mohd Shafie Mustafa
Jean-Pierre Caliman
author_sort Divo Dharma Silalahi
title Robust Wavelength Selection Using Filter-Wrapper Method and Input Scaling on Near Infrared Spectral Data
title_short Robust Wavelength Selection Using Filter-Wrapper Method and Input Scaling on Near Infrared Spectral Data
title_full Robust Wavelength Selection Using Filter-Wrapper Method and Input Scaling on Near Infrared Spectral Data
title_fullStr Robust Wavelength Selection Using Filter-Wrapper Method and Input Scaling on Near Infrared Spectral Data
title_full_unstemmed Robust Wavelength Selection Using Filter-Wrapper Method and Input Scaling on Near Infrared Spectral Data
title_sort robust wavelength selection using filter-wrapper method and input scaling on near infrared spectral data
publisher MDPI AG
series Sensors
issn 1424-8220
publishDate 2020-09-01
description The extraction of relevant wavelengths from a large dataset of Near Infrared Spectroscopy (NIRS) is a significant challenge in vibrational spectroscopy research. Nonetheless, this process allows the improvement in the chemical interpretability by emphasizing the chemical entities related to the chemical parameters of samples. With the complexity in the dataset, it may be possible that irrelevant wavelengths are still included in the multivariate calibration. This yields the computational process to become unnecessary complex and decreases the accuracy and robustness of the model. In multivariate analysis, Partial Least Square Regression (PLSR) is a method commonly used to build a predictive model from NIR spectral data. However, in the PLSR method and common commercial chemometrics software, there is no standard wavelength selection procedure applied to screen the irrelevant wavelengths. In this study, a new robust wavelength selection procedure called the modified VIP-MCUVE (mod-VIP-MCUVE) using Filter-Wrapper method and input scaling strategy is introduced. The proposed method combines the modified Variable Importance in Projection (VIP) and modified Monte Carlo Uninformative Variable Elimination (MCUVE) to calculate the scale matrix of the input variable. The modified VIP uses the orthogonal components of Partial Least Square (PLS) in investigating the informative variable in the model by applying the amount of variation both in <inline-formula><math display="inline"><semantics><mstyle mathvariant="bold" mathsize="normal"><mi>X</mi></mstyle></semantics></math></inline-formula> and <inline-formula><math display="inline"><semantics><mstyle mathvariant="bold" mathsize="normal"><mi>y</mi></mstyle></semantics></math></inline-formula><inline-formula><math display="inline"><semantics><mrow><mrow><mo>{</mo><mrow><mrow><mi>SSX</mi><mo>,</mo></mrow><mi>SSY</mi></mrow><mo>}</mo></mrow></mrow></semantics></math></inline-formula>, simultaneously. The modified MCUVE uses a robust reliability coefficient and a robust tolerance interval in the selection procedure. To evaluate the superiority of the proposed method, the classical VIP, MCUVE, and autoscaling procedure in classical PLSR were also included in the evaluation. Using artificial data with Monte Carlo simulation and NIR spectral data of oil palm (<i>Elaeis guineensis</i> Jacq.) fruit mesocarp, the study shows that the proposed method offers advantages to improve model interpretability, to be computationally extensive, and to produce better model accuracy.
topic near infrared spectral data
robust statistics
partial least squares
scaling
variable selection
variable importance in projection
url https://www.mdpi.com/1424-8220/20/17/5001
work_keys_str_mv AT divodharmasilalahi robustwavelengthselectionusingfilterwrappermethodandinputscalingonnearinfraredspectraldata
AT habshahmidi robustwavelengthselectionusingfilterwrappermethodandinputscalingonnearinfraredspectraldata
AT jayanthiarasan robustwavelengthselectionusingfilterwrappermethodandinputscalingonnearinfraredspectraldata
AT mohdshafiemustafa robustwavelengthselectionusingfilterwrappermethodandinputscalingonnearinfraredspectraldata
AT jeanpierrecaliman robustwavelengthselectionusingfilterwrappermethodandinputscalingonnearinfraredspectraldata
_version_ 1724827700655292416