Solution to overcome the sparsity issue of annotated data in medical domain

Annotations are critical for machine learning and developing computer aided diagnosis (CAD) algorithms. Good performance of CAD is critical to their adoption, which generally rely on training with a wide variety of annotated data. However, a vast amount of medical data is either unlabeled or annotat...

Full description

Bibliographic Details
Main Authors: Appan K. Pujitha, Jayanthi Sivaswamy
Format: Article
Language:English
Published: Wiley 2018-10-01
Series:CAAI Transactions on Intelligence Technology
Subjects:
CAD
Online Access:https://digital-library.theiet.org/content/journals/10.1049/trit.2018.1010
Description
Summary:Annotations are critical for machine learning and developing computer aided diagnosis (CAD) algorithms. Good performance of CAD is critical to their adoption, which generally rely on training with a wide variety of annotated data. However, a vast amount of medical data is either unlabeled or annotated only at the image-level. This poses a problem for exploring data driven approaches like deep learning for CAD. In this paper, we propose a novel crowdsourcing and synthetic image generation for training deep neural net-based lesion detection. The noisy nature of crowdsourced annotations is overcome by assigning a reliability factor for crowd subjects based on their performance and requiring region of interest markings from the crowd. A generative adversarial network-based solution is proposed to generate synthetic images with lesions to control the overall severity level of the disease. We demonstrate the reliability of the crowdsourced annotations and synthetic images by presenting a solution for training the deep neural network (DNN) with data drawn from a heterogeneous mixture of annotations. Experimental results obtained for hard exudate detection from retinal images show that training with refined crowdsourced data/synthetic images is effective as detection performance in terms of sensitivity improves by 25%/27% over training with just expert-markings.
ISSN:2468-2322