Solution to overcome the sparsity issue of annotated data in medical domain

Annotations are critical for machine learning and developing computer aided diagnosis (CAD) algorithms. Good performance of CAD is critical to their adoption, which generally rely on training with a wide variety of annotated data. However, a vast amount of medical data is either unlabeled or annotat...

Full description

Bibliographic Details
Main Authors:	Appan K. Pujitha, Jayanthi Sivaswamy
Format:	Article
Language:	English
Published:	Wiley 2018-10-01
Series:	CAAI Transactions on Intelligence Technology
Subjects:	learning (artificial intelligence) image colour analysis neural nets image classification image segmentation medical image processing diseases annotated data medical domain machine learning developing computer diagnosis algorithms CAD good performance medical data image level data-driven approaches deep learning data augmentation popular solution synthetic image generation crowdsourced annotations interest markings pixel-level markings generative adversarial network-based solution severity level crowdsourced region synthetically generated data colour fundus images processed/refined crowdsourced data/synthetic images detection performance
Online Access:	https://digital-library.theiet.org/content/journals/10.1049/trit.2018.1010

Description
Summary:	Annotations are critical for machine learning and developing computer aided diagnosis (CAD) algorithms. Good performance of CAD is critical to their adoption, which generally rely on training with a wide variety of annotated data. However, a vast amount of medical data is either unlabeled or annotated only at the image-level. This poses a problem for exploring data driven approaches like deep learning for CAD. In this paper, we propose a novel crowdsourcing and synthetic image generation for training deep neural net-based lesion detection. The noisy nature of crowdsourced annotations is overcome by assigning a reliability factor for crowd subjects based on their performance and requiring region of interest markings from the crowd. A generative adversarial network-based solution is proposed to generate synthetic images with lesions to control the overall severity level of the disease. We demonstrate the reliability of the crowdsourced annotations and synthetic images by presenting a solution for training the deep neural network (DNN) with data drawn from a heterogeneous mixture of annotations. Experimental results obtained for hard exudate detection from retinal images show that training with refined crowdsourced data/synthetic images is effective as detection performance in terms of sensitivity improves by 25%/27% over training with just expert-markings.
ISSN:	2468-2322

Solution to overcome the sparsity issue of annotated data in medical domain

Similar Items