Hybrid Human-Machine Vision Systems: Image Annotation using Crowds, Experts and Machines
The amount of digital image and video data keeps increasing at an ever-faster rate. While "big data" holds the promise of leading science to new discoveries, raw image data in itself is not of much use. In order to statistically analyze the data, it must be quantified and annotated. We arg...
Summary: | The amount of digital image and video data keeps increasing at an ever-faster rate. While "big data" holds the promise of leading science to new discoveries, raw image data in itself is not of much use. In order to statistically analyze the data, it must be quantified and annotated. We argue that entirely automated methods are not accurate enough to annotate data in the short term. Crowdsourcing is an alternative that provides higher accuracy, but is too expensive to scale to millions of images. Instead, the solution is hybrid human-machine vision systems, where the work of both humans and machines is balanced to be as cost-effective and accurate as possible. With this goal in mind, we begin by categorizing different types of image annotations, and describe how nonexpert annotators can be trained to carry out challenging image annotation tasks. Having identified which types of annotations are appropriate for most tasks, including binary, confidence, pair-wise and continuous annotations, we present models for crowdsourcing annotations from hundreds of expert and nonexpert annotators (humans). By trading off the bias and expertise of multiple annotators, we show that it is possible to achieve high-quality annotations with very few labels. We show that the number of labels can be further reduced by actively choosing the best annotators to carry out most of the work. Finally, we study the problem of estimating the performance of automated classifiers (machines) used to annotate large datasets where few ground truth labels are available. Using a semisupervised model for classifier confidence scores, we show that it is possible to accurately estimate classifier performance with very few labels. |
---|