Towards a Visipedia: Combining Computer Vision and Communities of Experts

<p>Motivated by the idea of a Visipedia, where users can search and explore by image, this thesis presents tools and techniques for empowering expert communities through computer vision. The collective aim of this work is to provide a scalable foundation upon which an application like Visipedi...

Full description

Bibliographic Details
Main Author: Van Horn, Grant Richard
Format: Others
Published: 2019
Online Access:https://thesis.library.caltech.edu/11502/1/Towards_a_Visipedia__Tools_and_Techniques_for_Computer_Vision_Dataset_Collection%20%284%29.pdf
Van Horn, Grant Richard (2019) Towards a Visipedia: Combining Computer Vision and Communities of Experts. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/20DQ-Y220. https://resolver.caltech.edu/CaltechTHESIS:05082019-103122440 <https://resolver.caltech.edu/CaltechTHESIS:05082019-103122440>
Description
Summary:<p>Motivated by the idea of a Visipedia, where users can search and explore by image, this thesis presents tools and techniques for empowering expert communities through computer vision. The collective aim of this work is to provide a scalable foundation upon which an application like Visipedia can be built. We conduct experiments using two highly motivated communities, the birding community and the naturalist community, and report results and lessons on how to build the necessary components of a Visipedia. First, we conduct experiments analyzing the behavior of state-of-the-art computer vision classifiers on long tailed datasets. We find poor feature sharing between classes, potentially limiting the applicability of these models and emphasizing the ability to intelligently direct data collection resources. Second, we devise online crowdsourcing algorithms to make dataset collection for binary labels, multiclass labels, keypoints, and mulit-instance bounding boxes faster, cheaper, and more accurate. These methods jointly estimate labels, worker skills, and train computer vision models for these tasks. Experiments show that we can achieve significant cost savings compared to traditional data collection techniques, and that we can produce a more accurate dataset compared to traditional data collection techniques. Third, we present two fine-grained datasets, detail how they were constructed, and analyze the test accuracy of state-of-the-art methods. These datasets are then used to create applications that help users identify species in their photographs: Merlin, an app assisting users in identifying birds species, and iNaturalist, an app that assists users in identifying a broad variety of species. Finally, we present work aimed at reducing the computational burden of large scale classification with the goal of creating an application that allows users to classify tens of thousands of species in real time on their mobile device. As a whole, the lessons learned and the techniques presented in this thesis bring us closer to the realization of a Visipedia.</p>