Managing imbalanced training data by sequential segmentation in machine learning

Imbalanced training data is a common problem in machine learning applications. Thisproblem refers to datasets in which the foreground pixels are significantly fewer thanthe background pixels. By training a machine learning model with imbalanced data, theresult is typically a model that classifies al...

Full description

Bibliographic Details
Main Author: Bardolet Pettersson, Susana
Format: Others
Language:English
Published: Linköpings universitet, Avdelningen för medicinsk teknik 2019
Subjects:
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-155091
id ndltd-UPSALLA1-oai-DiVA.org-liu-155091
record_format oai_dc
spelling ndltd-UPSALLA1-oai-DiVA.org-liu-1550912019-03-20T17:36:33ZManaging imbalanced training data by sequential segmentation in machine learningengBardolet Pettersson, SusanaLinköpings universitet, Avdelningen för medicinsk teknik2019Medical Image ProcessingMedicinsk bildbehandlingImbalanced training data is a common problem in machine learning applications. Thisproblem refers to datasets in which the foreground pixels are significantly fewer thanthe background pixels. By training a machine learning model with imbalanced data, theresult is typically a model that classifies all pixels as the background class. A result thatindicates no presence of a specific condition when it is actually present is particularlyundesired in medical imaging applications. This project proposes a sequential system oftwo fully convolutional neural networks to tackle the problem. Semantic segmentation oflung nodules in thoracic computed tomography images has been performed to evaluate theperformance of the system. The imbalanced data problem is present in the training datasetused in this project, where the average percentage of pixels belonging to the foregroundclass is 0.0038 %. The sequential system achieved a sensitivity of 83.1 % representing anincrease of 34 % compared to the single system. The system only missed 16.83% of thenodules but had a Dice score of 21.6 % due to the detection of multiple false positives. Thismethod shows considerable potential to be a solution to the imbalanced data problem withcontinued development. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-155091application/pdfinfo:eu-repo/semantics/openAccess
collection NDLTD
language English
format Others
sources NDLTD
topic Medical Image Processing
Medicinsk bildbehandling
spellingShingle Medical Image Processing
Medicinsk bildbehandling
Bardolet Pettersson, Susana
Managing imbalanced training data by sequential segmentation in machine learning
description Imbalanced training data is a common problem in machine learning applications. Thisproblem refers to datasets in which the foreground pixels are significantly fewer thanthe background pixels. By training a machine learning model with imbalanced data, theresult is typically a model that classifies all pixels as the background class. A result thatindicates no presence of a specific condition when it is actually present is particularlyundesired in medical imaging applications. This project proposes a sequential system oftwo fully convolutional neural networks to tackle the problem. Semantic segmentation oflung nodules in thoracic computed tomography images has been performed to evaluate theperformance of the system. The imbalanced data problem is present in the training datasetused in this project, where the average percentage of pixels belonging to the foregroundclass is 0.0038 %. The sequential system achieved a sensitivity of 83.1 % representing anincrease of 34 % compared to the single system. The system only missed 16.83% of thenodules but had a Dice score of 21.6 % due to the detection of multiple false positives. Thismethod shows considerable potential to be a solution to the imbalanced data problem withcontinued development.
author Bardolet Pettersson, Susana
author_facet Bardolet Pettersson, Susana
author_sort Bardolet Pettersson, Susana
title Managing imbalanced training data by sequential segmentation in machine learning
title_short Managing imbalanced training data by sequential segmentation in machine learning
title_full Managing imbalanced training data by sequential segmentation in machine learning
title_fullStr Managing imbalanced training data by sequential segmentation in machine learning
title_full_unstemmed Managing imbalanced training data by sequential segmentation in machine learning
title_sort managing imbalanced training data by sequential segmentation in machine learning
publisher Linköpings universitet, Avdelningen för medicinsk teknik
publishDate 2019
url http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-155091
work_keys_str_mv AT bardoletpetterssonsusana managingimbalancedtrainingdatabysequentialsegmentationinmachinelearning
_version_ 1719004452050960384