Semi-automatic data annotation based on feature-space projection and local quality metrics: An application to cerebral emboli characterization

We propose a semi-supervised learning approach to annotate a dataset with reduced requirements for manual annotation and with controlled annotation error. The method is based on feature-space projection and label propagation using local quality metrics. First, an auto-encoder extracts the features o...

Full description

Bibliographic Details
Main Authors: Almar, M. (Author), Delachartre, P. (Author), Guépié, B.K (Author), Roux, E. (Author), Vindas, Y. (Author)
Format: Article
Language:English
Published: Elsevier B.V. 2022
Subjects:
Online Access:View Fulltext in Publisher
LEADER 03013nam a2200457Ia 4500
001 10.1016-j.media.2022.102437
008 220425s2022 CNT 000 0 und d
020 |a 13618415 (ISSN) 
245 1 0 |a Semi-automatic data annotation based on feature-space projection and local quality metrics: An application to cerebral emboli characterization 
260 0 |b Elsevier B.V.  |c 2022 
856 |z View Fulltext in Publisher  |u https://doi.org/10.1016/j.media.2022.102437 
520 3 |a We propose a semi-supervised learning approach to annotate a dataset with reduced requirements for manual annotation and with controlled annotation error. The method is based on feature-space projection and label propagation using local quality metrics. First, an auto-encoder extracts the features of the samples in an unsupervised manner. Then, the extracted features are projected by a t-distributed stochastic neighbor embedding algorithm into a two-dimensional (2D) space. A selection of the best 2D projection is introduced based on the silhouette score. The expert annotator uses the obtained 2D representation to manually label samples. Finally, the labels of the labeled samples are propagated to the unlabeled samples using a K-nearest neighbor strategy and local quality metrics. We compare our method against semi-supervised optimum-path forest and K-nearest neighbor label propagation (without considering local quality metrics). Our method achieves state-of-the-art results on three different datasets by labeling more than 96% of the samples with an annotation error from 7% to 17%. Additionally, our method allows to control the trade-off between annotation error and number of labeled samples. Moreover, we combine our method with robust loss functions to compensate for the label noise introduced by automatic label propagation. Our method allows to achieve similar, and even better, classification performances compared to those obtained using a fully manually labeled dataset, with up to 6% in terms of classification accuracy. © 2022 
650 0 4 |a Annotation errors 
650 0 4 |a Classification (of information) 
650 0 4 |a Data annotation 
650 0 4 |a Data annotation 
650 0 4 |a Economic and social effects 
650 0 4 |a Emboli characterization 
650 0 4 |a Embolus characterization 
650 0 4 |a Errors 
650 0 4 |a Feature space 
650 0 4 |a Label propagation 
650 0 4 |a Learning algorithms 
650 0 4 |a Motion compensation 
650 0 4 |a Nearest neighbor search 
650 0 4 |a Nearest-neighbour 
650 0 4 |a Noisy labels 
650 0 4 |a Noisy labels 
650 0 4 |a Quality metrices 
650 0 4 |a Semi-automatics 
650 0 4 |a Semi-supervised learning 
650 0 4 |a Stochastic systems 
650 0 4 |a Stroke 
650 0 4 |a Stroke 
650 0 4 |a Supervised learning 
700 1 |a Almar, M.  |e author 
700 1 |a Delachartre, P.  |e author 
700 1 |a Guépié, B.K.  |e author 
700 1 |a Roux, E.  |e author 
700 1 |a Vindas, Y.  |e author 
773 |t Medical Image Analysis