Kernel-Based Data Mining Approach with Variable Selection for Nonlinear High-Dimensional Data

In statistical data mining research, datasets often have nonlinearity and high-dimensionality. It has become difficult to analyze such datasets in a comprehensive manner using traditional statistical methodologies. Kernel-based data mining is one of the most effective statistical methodologies to in...

Full description

Bibliographic Details
Main Author:	Baek, Seung Hyun
Format:	Others
Published:	Trace: Tennessee Research and Creative Exchange 2010
Subjects:	Classification Support Vector Machine Information Complexity Wavelet Thresholding Recursive Feature Elimination Floating Search Industrial Engineering
Online Access:	http://trace.tennessee.edu/utk_graddiss/676

id	ndltd-UTENN-oai-trace.tennessee.edu-utk_graddiss-1758
record_format	oai_dc
spelling	ndltd-UTENN-oai-trace.tennessee.edu-utk_graddiss-17582011-12-13T16:02:54Z Kernel-Based Data Mining Approach with Variable Selection for Nonlinear High-Dimensional Data Baek, Seung Hyun In statistical data mining research, datasets often have nonlinearity and high-dimensionality. It has become difficult to analyze such datasets in a comprehensive manner using traditional statistical methodologies. Kernel-based data mining is one of the most effective statistical methodologies to investigate a variety of problems in areas including pattern recognition, machine learning, bioinformatics, chemometrics, and statistics. In particular, statistically-sophisticated procedures that emphasize the reliability of results and computational efficiency are required for the analysis of high-dimensional data. In this dissertation, first, a novel wrapper method called SVM-ICOMP-RFE based on hybridized support vector machine (SVM) and recursive feature elimination (RFE) with information-theoretic measure of complexity (ICOMP) is introduced and developed to classify high-dimensional data sets and to carry out subset selection of the variables in the original data space for finding the best for discriminating between groups. Recursive feature elimination (RFE) ranks variables based on the information-theoretic measure of complexity (ICOMP) criterion. Second, a dual variables functional support vector machine approach is proposed. The proposed approach uses both the first and second derivatives of the degradation profiles. The modified floating search algorithm for the repeated variable selection, with newly-added degradation path points, is presented to find a few good variables while reducing the computation time for on-line implementation. Third, a two-stage scheme for the classification of near infrared (NIR) spectral data is proposed. In the first stage, the proposed multi-scale vertical energy thresholding (MSVET) procedure is used to reduce the dimension of the high-dimensional spectral data. In the second stage, a few important wavelet coefficients are selected using the proposed SVM gradient-recursive feature elimination (RFE). Fourth, a novel methodology based on a human decision making process for discriminant analysis called PDCM is proposed. The proposed methodology consists of three basic steps emulating the thinking process: perception, decision, and cognition. In these steps two concepts known as support vector machines for classification and information complexity are integrated to evaluate learning models. 2010-05-01 text application/pdf http://trace.tennessee.edu/utk_graddiss/676 Doctoral Dissertations Trace: Tennessee Research and Creative Exchange Classification Support Vector Machine Information Complexity Wavelet Thresholding Recursive Feature Elimination Floating Search Industrial Engineering
collection	NDLTD
format	Others
sources	NDLTD
topic	Classification Support Vector Machine Information Complexity Wavelet Thresholding Recursive Feature Elimination Floating Search Industrial Engineering
spellingShingle	Classification Support Vector Machine Information Complexity Wavelet Thresholding Recursive Feature Elimination Floating Search Industrial Engineering Baek, Seung Hyun Kernel-Based Data Mining Approach with Variable Selection for Nonlinear High-Dimensional Data
description	In statistical data mining research, datasets often have nonlinearity and high-dimensionality. It has become difficult to analyze such datasets in a comprehensive manner using traditional statistical methodologies. Kernel-based data mining is one of the most effective statistical methodologies to investigate a variety of problems in areas including pattern recognition, machine learning, bioinformatics, chemometrics, and statistics. In particular, statistically-sophisticated procedures that emphasize the reliability of results and computational efficiency are required for the analysis of high-dimensional data. In this dissertation, first, a novel wrapper method called SVM-ICOMP-RFE based on hybridized support vector machine (SVM) and recursive feature elimination (RFE) with information-theoretic measure of complexity (ICOMP) is introduced and developed to classify high-dimensional data sets and to carry out subset selection of the variables in the original data space for finding the best for discriminating between groups. Recursive feature elimination (RFE) ranks variables based on the information-theoretic measure of complexity (ICOMP) criterion. Second, a dual variables functional support vector machine approach is proposed. The proposed approach uses both the first and second derivatives of the degradation profiles. The modified floating search algorithm for the repeated variable selection, with newly-added degradation path points, is presented to find a few good variables while reducing the computation time for on-line implementation. Third, a two-stage scheme for the classification of near infrared (NIR) spectral data is proposed. In the first stage, the proposed multi-scale vertical energy thresholding (MSVET) procedure is used to reduce the dimension of the high-dimensional spectral data. In the second stage, a few important wavelet coefficients are selected using the proposed SVM gradient-recursive feature elimination (RFE). Fourth, a novel methodology based on a human decision making process for discriminant analysis called PDCM is proposed. The proposed methodology consists of three basic steps emulating the thinking process: perception, decision, and cognition. In these steps two concepts known as support vector machines for classification and information complexity are integrated to evaluate learning models.
author	Baek, Seung Hyun
author_facet	Baek, Seung Hyun
author_sort	Baek, Seung Hyun
title	Kernel-Based Data Mining Approach with Variable Selection for Nonlinear High-Dimensional Data
title_short	Kernel-Based Data Mining Approach with Variable Selection for Nonlinear High-Dimensional Data
title_full	Kernel-Based Data Mining Approach with Variable Selection for Nonlinear High-Dimensional Data
title_fullStr	Kernel-Based Data Mining Approach with Variable Selection for Nonlinear High-Dimensional Data
title_full_unstemmed	Kernel-Based Data Mining Approach with Variable Selection for Nonlinear High-Dimensional Data
title_sort	kernel-based data mining approach with variable selection for nonlinear high-dimensional data
publisher	Trace: Tennessee Research and Creative Exchange
publishDate	2010
url	http://trace.tennessee.edu/utk_graddiss/676
work_keys_str_mv	AT baekseunghyun kernelbaseddataminingapproachwithvariableselectionfornonlinearhighdimensionaldata
_version_	1716389952554532864

Kernel-Based Data Mining Approach with Variable Selection for Nonlinear High-Dimensional Data

Similar Items