Oversampling Methods for Imbalanced Dataset Classification and their Application to Gynecological Disorder Diagnosis

In many applications, the dataset for classification may be highly imbalanced where most of the instances in the training set may belong to some of the classes (majority classes), while only a few instances are from the other classes (minority classes). Conventional classifiers will strongly favor t...

Full description

Bibliographic Details
Main Author: Nekooeimehr, Iman
Format: Others
Published: Scholar Commons 2016
Subjects:
Online Access:http://scholarcommons.usf.edu/etd/6335
http://scholarcommons.usf.edu/cgi/viewcontent.cgi?article=7531&context=etd
id ndltd-USF-oai-scholarcommons.usf.edu-etd-7531
record_format oai_dc
collection NDLTD
format Others
sources NDLTD
topic Binary Classification
Ordinal Regression
Pelvic Organ Prolapse
Object Tracking
Trajectory Analysis
Computer Sciences
Industrial Engineering
Medicine and Health Sciences
spellingShingle Binary Classification
Ordinal Regression
Pelvic Organ Prolapse
Object Tracking
Trajectory Analysis
Computer Sciences
Industrial Engineering
Medicine and Health Sciences
Nekooeimehr, Iman
Oversampling Methods for Imbalanced Dataset Classification and their Application to Gynecological Disorder Diagnosis
description In many applications, the dataset for classification may be highly imbalanced where most of the instances in the training set may belong to some of the classes (majority classes), while only a few instances are from the other classes (minority classes). Conventional classifiers will strongly favor the majority class and ignore the minority instances. The imbalance problem can occur in both binary data classification and also in ordinal regression. Ordinal regression is a supervised approach for learning the ordinal relationship between classes. Extensive research has been performed for addressing imbalanced datasets for binary classification; however, current methods do not address within-class imbalance and between-class imbalance at the same time. Similarly, there has been very little research work on addressing imbalanced datasets for ordinal regression. Although current standard oversampling methods can be used to improve the dataset class distribution, they do not consider the ordinal relationship between the classes. The class imbalance problem is a big challenge in classification problems. Most of the clinical datasets are highly imbalanced, which can weaken the performance of classifiers significantly. In this research, the imbalanced dataset classification problem is also examined in the context of a clinical application, particularly pelvic organ prolapse diagnosis. Pelvic organ prolapse (POP) is a major health problem that affects between 30-50% of women in the U.S. Although clinical examination is currently used to diagnose POP, there is still little evidence on specific risk factors that are directly related to particular types of POP and their severity or stages (Stage 0-IV). Data from dynamic MRI related to the movement of pelvic organs has the potential to improve POP prediction but it is currently analyzed manually limiting its exploration and use to small datasets. Moreover, POP is a disorder with multiple stages that are ordinal and whose distribution is highly imbalanced. The main goal of this research is two-fold. The first goal is to design new oversampling methods for imbalanced datasets for both binary classification and ordinal regression. The second goal is to automatically track, segment, and classify the trajectory of multiple organs on dynamic MRI to quantitatively describe pelvic organ movement. The extracted image-based data along with the designed oversampling methods will be used to improve the diagnosis of POP. The proposed research consists of three major objectives: 1) to design a new oversampling technique for binary imbalanced dataset classification; 2) to design a novel oversampling technique for ordinal regression with imbalanced datasets; and 3) to design a two-stage method to automatically track and segment multiple pelvic organs on dynamic MRI for improving the prediction of multi-stage POP with imbalanced datasets. The proposed research aims to provide robust oversampling techniques and image processing models that can (1) effectively handle highly imbalanced datasets for both binary classification and ordinal regression, and (2) automatically track and segment multiple deformable structures for feature extraction from low contrast and nonhomogeneous images and classify them using the resulted trajectories. This research will set the foundation towards a computer-aided decision support system that can automatically extract and analyze image and clinical data to improve the prediction of disorders where the dataset is highly imbalanced through personalized and evidence-based assessment.
author Nekooeimehr, Iman
author_facet Nekooeimehr, Iman
author_sort Nekooeimehr, Iman
title Oversampling Methods for Imbalanced Dataset Classification and their Application to Gynecological Disorder Diagnosis
title_short Oversampling Methods for Imbalanced Dataset Classification and their Application to Gynecological Disorder Diagnosis
title_full Oversampling Methods for Imbalanced Dataset Classification and their Application to Gynecological Disorder Diagnosis
title_fullStr Oversampling Methods for Imbalanced Dataset Classification and their Application to Gynecological Disorder Diagnosis
title_full_unstemmed Oversampling Methods for Imbalanced Dataset Classification and their Application to Gynecological Disorder Diagnosis
title_sort oversampling methods for imbalanced dataset classification and their application to gynecological disorder diagnosis
publisher Scholar Commons
publishDate 2016
url http://scholarcommons.usf.edu/etd/6335
http://scholarcommons.usf.edu/cgi/viewcontent.cgi?article=7531&context=etd
work_keys_str_mv AT nekooeimehriman oversamplingmethodsforimbalanceddatasetclassificationandtheirapplicationtogynecologicaldisorderdiagnosis
_version_ 1718523911212105728
spelling ndltd-USF-oai-scholarcommons.usf.edu-etd-75312017-09-01T05:25:59Z Oversampling Methods for Imbalanced Dataset Classification and their Application to Gynecological Disorder Diagnosis Nekooeimehr, Iman In many applications, the dataset for classification may be highly imbalanced where most of the instances in the training set may belong to some of the classes (majority classes), while only a few instances are from the other classes (minority classes). Conventional classifiers will strongly favor the majority class and ignore the minority instances. The imbalance problem can occur in both binary data classification and also in ordinal regression. Ordinal regression is a supervised approach for learning the ordinal relationship between classes. Extensive research has been performed for addressing imbalanced datasets for binary classification; however, current methods do not address within-class imbalance and between-class imbalance at the same time. Similarly, there has been very little research work on addressing imbalanced datasets for ordinal regression. Although current standard oversampling methods can be used to improve the dataset class distribution, they do not consider the ordinal relationship between the classes. The class imbalance problem is a big challenge in classification problems. Most of the clinical datasets are highly imbalanced, which can weaken the performance of classifiers significantly. In this research, the imbalanced dataset classification problem is also examined in the context of a clinical application, particularly pelvic organ prolapse diagnosis. Pelvic organ prolapse (POP) is a major health problem that affects between 30-50% of women in the U.S. Although clinical examination is currently used to diagnose POP, there is still little evidence on specific risk factors that are directly related to particular types of POP and their severity or stages (Stage 0-IV). Data from dynamic MRI related to the movement of pelvic organs has the potential to improve POP prediction but it is currently analyzed manually limiting its exploration and use to small datasets. Moreover, POP is a disorder with multiple stages that are ordinal and whose distribution is highly imbalanced. The main goal of this research is two-fold. The first goal is to design new oversampling methods for imbalanced datasets for both binary classification and ordinal regression. The second goal is to automatically track, segment, and classify the trajectory of multiple organs on dynamic MRI to quantitatively describe pelvic organ movement. The extracted image-based data along with the designed oversampling methods will be used to improve the diagnosis of POP. The proposed research consists of three major objectives: 1) to design a new oversampling technique for binary imbalanced dataset classification; 2) to design a novel oversampling technique for ordinal regression with imbalanced datasets; and 3) to design a two-stage method to automatically track and segment multiple pelvic organs on dynamic MRI for improving the prediction of multi-stage POP with imbalanced datasets. The proposed research aims to provide robust oversampling techniques and image processing models that can (1) effectively handle highly imbalanced datasets for both binary classification and ordinal regression, and (2) automatically track and segment multiple deformable structures for feature extraction from low contrast and nonhomogeneous images and classify them using the resulted trajectories. This research will set the foundation towards a computer-aided decision support system that can automatically extract and analyze image and clinical data to improve the prediction of disorders where the dataset is highly imbalanced through personalized and evidence-based assessment. 2016-06-29T07:00:00Z text application/pdf http://scholarcommons.usf.edu/etd/6335 http://scholarcommons.usf.edu/cgi/viewcontent.cgi?article=7531&context=etd default Graduate Theses and Dissertations Scholar Commons Binary Classification Ordinal Regression Pelvic Organ Prolapse Object Tracking Trajectory Analysis Computer Sciences Industrial Engineering Medicine and Health Sciences