Oversampling Methods for Imbalanced Dataset Classification and their Application to Gynecological Disorder Diagnosis
In many applications, the dataset for classification may be highly imbalanced where most of the instances in the training set may belong to some of the classes (majority classes), while only a few instances are from the other classes (minority classes). Conventional classifiers will strongly favor t...
Main Author: | |
---|---|
Format: | Others |
Published: |
Scholar Commons
2016
|
Subjects: | |
Online Access: | http://scholarcommons.usf.edu/etd/6335 http://scholarcommons.usf.edu/cgi/viewcontent.cgi?article=7531&context=etd |
id |
ndltd-USF-oai-scholarcommons.usf.edu-etd-7531 |
---|---|
record_format |
oai_dc |
collection |
NDLTD |
format |
Others
|
sources |
NDLTD |
topic |
Binary Classification Ordinal Regression Pelvic Organ Prolapse Object Tracking Trajectory Analysis Computer Sciences Industrial Engineering Medicine and Health Sciences |
spellingShingle |
Binary Classification Ordinal Regression Pelvic Organ Prolapse Object Tracking Trajectory Analysis Computer Sciences Industrial Engineering Medicine and Health Sciences Nekooeimehr, Iman Oversampling Methods for Imbalanced Dataset Classification and their Application to Gynecological Disorder Diagnosis |
description |
In many applications, the dataset for classification may be highly imbalanced where most of the instances in the training set may belong to some of the classes (majority classes), while only a few instances are from the other classes (minority classes). Conventional classifiers will strongly favor the majority class and ignore the minority instances. The imbalance problem can occur in both binary data classification and also in ordinal regression. Ordinal regression is a supervised approach for learning the ordinal relationship between classes. Extensive research has been performed for addressing imbalanced datasets for binary classification; however, current methods do not address within-class imbalance and between-class imbalance at the same time. Similarly, there has been very little research work on addressing imbalanced datasets for ordinal regression. Although current standard oversampling methods can be used to improve the dataset class distribution, they do not consider the ordinal relationship between the classes.
The class imbalance problem is a big challenge in classification problems. Most of the clinical datasets are highly imbalanced, which can weaken the performance of classifiers significantly. In this research, the imbalanced dataset classification problem is also examined in the context of a clinical application, particularly pelvic organ prolapse diagnosis. Pelvic organ prolapse (POP) is a major health problem that affects between 30-50% of women in the U.S. Although clinical examination is currently used to diagnose POP, there is still little evidence on specific risk factors that are directly related to particular types of POP and their severity or stages (Stage 0-IV). Data from dynamic MRI related to the movement of pelvic organs has the potential to improve POP prediction but it is currently analyzed manually limiting its exploration and use to small datasets. Moreover, POP is a disorder with multiple stages that are ordinal and whose distribution is highly imbalanced.
The main goal of this research is two-fold. The first goal is to design new oversampling methods for imbalanced datasets for both binary classification and ordinal regression. The second goal is to automatically track, segment, and classify the trajectory of multiple organs on dynamic MRI to quantitatively describe pelvic organ movement. The extracted image-based data along with the designed oversampling methods will be used to improve the diagnosis of POP. The proposed research consists of three major objectives: 1) to design a new oversampling technique for binary imbalanced dataset classification; 2) to design a novel oversampling technique for ordinal regression with imbalanced datasets; and 3) to design a two-stage method to automatically track and segment multiple pelvic organs on dynamic MRI for improving the prediction of multi-stage POP with imbalanced datasets.
The proposed research aims to provide robust oversampling techniques and image processing models that can (1) effectively handle highly imbalanced datasets for both binary classification and ordinal regression, and (2) automatically track and segment multiple deformable structures for feature extraction from low contrast and nonhomogeneous images and classify them using the resulted trajectories. This research will set the foundation towards a computer-aided decision support system that can automatically extract and analyze image and clinical data to improve the prediction of disorders where the dataset is highly imbalanced through personalized and evidence-based assessment. |
author |
Nekooeimehr, Iman |
author_facet |
Nekooeimehr, Iman |
author_sort |
Nekooeimehr, Iman |
title |
Oversampling Methods for Imbalanced Dataset Classification and their Application to Gynecological Disorder Diagnosis |
title_short |
Oversampling Methods for Imbalanced Dataset Classification and their Application to Gynecological Disorder Diagnosis |
title_full |
Oversampling Methods for Imbalanced Dataset Classification and their Application to Gynecological Disorder Diagnosis |
title_fullStr |
Oversampling Methods for Imbalanced Dataset Classification and their Application to Gynecological Disorder Diagnosis |
title_full_unstemmed |
Oversampling Methods for Imbalanced Dataset Classification and their Application to Gynecological Disorder Diagnosis |
title_sort |
oversampling methods for imbalanced dataset classification and their application to gynecological disorder diagnosis |
publisher |
Scholar Commons |
publishDate |
2016 |
url |
http://scholarcommons.usf.edu/etd/6335 http://scholarcommons.usf.edu/cgi/viewcontent.cgi?article=7531&context=etd |
work_keys_str_mv |
AT nekooeimehriman oversamplingmethodsforimbalanceddatasetclassificationandtheirapplicationtogynecologicaldisorderdiagnosis |
_version_ |
1718523911212105728 |
spelling |
ndltd-USF-oai-scholarcommons.usf.edu-etd-75312017-09-01T05:25:59Z Oversampling Methods for Imbalanced Dataset Classification and their Application to Gynecological Disorder Diagnosis Nekooeimehr, Iman In many applications, the dataset for classification may be highly imbalanced where most of the instances in the training set may belong to some of the classes (majority classes), while only a few instances are from the other classes (minority classes). Conventional classifiers will strongly favor the majority class and ignore the minority instances. The imbalance problem can occur in both binary data classification and also in ordinal regression. Ordinal regression is a supervised approach for learning the ordinal relationship between classes. Extensive research has been performed for addressing imbalanced datasets for binary classification; however, current methods do not address within-class imbalance and between-class imbalance at the same time. Similarly, there has been very little research work on addressing imbalanced datasets for ordinal regression. Although current standard oversampling methods can be used to improve the dataset class distribution, they do not consider the ordinal relationship between the classes. The class imbalance problem is a big challenge in classification problems. Most of the clinical datasets are highly imbalanced, which can weaken the performance of classifiers significantly. In this research, the imbalanced dataset classification problem is also examined in the context of a clinical application, particularly pelvic organ prolapse diagnosis. Pelvic organ prolapse (POP) is a major health problem that affects between 30-50% of women in the U.S. Although clinical examination is currently used to diagnose POP, there is still little evidence on specific risk factors that are directly related to particular types of POP and their severity or stages (Stage 0-IV). Data from dynamic MRI related to the movement of pelvic organs has the potential to improve POP prediction but it is currently analyzed manually limiting its exploration and use to small datasets. Moreover, POP is a disorder with multiple stages that are ordinal and whose distribution is highly imbalanced. The main goal of this research is two-fold. The first goal is to design new oversampling methods for imbalanced datasets for both binary classification and ordinal regression. The second goal is to automatically track, segment, and classify the trajectory of multiple organs on dynamic MRI to quantitatively describe pelvic organ movement. The extracted image-based data along with the designed oversampling methods will be used to improve the diagnosis of POP. The proposed research consists of three major objectives: 1) to design a new oversampling technique for binary imbalanced dataset classification; 2) to design a novel oversampling technique for ordinal regression with imbalanced datasets; and 3) to design a two-stage method to automatically track and segment multiple pelvic organs on dynamic MRI for improving the prediction of multi-stage POP with imbalanced datasets. The proposed research aims to provide robust oversampling techniques and image processing models that can (1) effectively handle highly imbalanced datasets for both binary classification and ordinal regression, and (2) automatically track and segment multiple deformable structures for feature extraction from low contrast and nonhomogeneous images and classify them using the resulted trajectories. This research will set the foundation towards a computer-aided decision support system that can automatically extract and analyze image and clinical data to improve the prediction of disorders where the dataset is highly imbalanced through personalized and evidence-based assessment. 2016-06-29T07:00:00Z text application/pdf http://scholarcommons.usf.edu/etd/6335 http://scholarcommons.usf.edu/cgi/viewcontent.cgi?article=7531&context=etd default Graduate Theses and Dissertations Scholar Commons Binary Classification Ordinal Regression Pelvic Organ Prolapse Object Tracking Trajectory Analysis Computer Sciences Industrial Engineering Medicine and Health Sciences |