Towards a Visipedia: Combining Computer Vision and Communities of Experts

<p>Motivated by the idea of a Visipedia, where users can search and explore by image, this thesis presents tools and techniques for empowering expert communities through computer vision. The collective aim of this work is to provide a scalable foundation upon which an application like Visipedi...

Full description

Bibliographic Details
Main Author: Van Horn, Grant Richard
Format: Others
Published: 2019
Online Access:https://thesis.library.caltech.edu/11502/1/Towards_a_Visipedia__Tools_and_Techniques_for_Computer_Vision_Dataset_Collection%20%284%29.pdf
Van Horn, Grant Richard (2019) Towards a Visipedia: Combining Computer Vision and Communities of Experts. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/20DQ-Y220. https://resolver.caltech.edu/CaltechTHESIS:05082019-103122440 <https://resolver.caltech.edu/CaltechTHESIS:05082019-103122440>
id ndltd-CALTECH-oai-thesis.library.caltech.edu-11502
record_format oai_dc
spelling ndltd-CALTECH-oai-thesis.library.caltech.edu-115022019-10-05T03:06:00Z Towards a Visipedia: Combining Computer Vision and Communities of Experts Van Horn, Grant Richard <p>Motivated by the idea of a Visipedia, where users can search and explore by image, this thesis presents tools and techniques for empowering expert communities through computer vision. The collective aim of this work is to provide a scalable foundation upon which an application like Visipedia can be built. We conduct experiments using two highly motivated communities, the birding community and the naturalist community, and report results and lessons on how to build the necessary components of a Visipedia. First, we conduct experiments analyzing the behavior of state-of-the-art computer vision classifiers on long tailed datasets. We find poor feature sharing between classes, potentially limiting the applicability of these models and emphasizing the ability to intelligently direct data collection resources. Second, we devise online crowdsourcing algorithms to make dataset collection for binary labels, multiclass labels, keypoints, and mulit-instance bounding boxes faster, cheaper, and more accurate. These methods jointly estimate labels, worker skills, and train computer vision models for these tasks. Experiments show that we can achieve significant cost savings compared to traditional data collection techniques, and that we can produce a more accurate dataset compared to traditional data collection techniques. Third, we present two fine-grained datasets, detail how they were constructed, and analyze the test accuracy of state-of-the-art methods. These datasets are then used to create applications that help users identify species in their photographs: Merlin, an app assisting users in identifying birds species, and iNaturalist, an app that assists users in identifying a broad variety of species. Finally, we present work aimed at reducing the computational burden of large scale classification with the goal of creating an application that allows users to classify tens of thousands of species in real time on their mobile device. As a whole, the lessons learned and the techniques presented in this thesis bring us closer to the realization of a Visipedia.</p> 2019 Thesis NonPeerReviewed application/pdf https://thesis.library.caltech.edu/11502/1/Towards_a_Visipedia__Tools_and_Techniques_for_Computer_Vision_Dataset_Collection%20%284%29.pdf https://resolver.caltech.edu/CaltechTHESIS:05082019-103122440 Van Horn, Grant Richard (2019) Towards a Visipedia: Combining Computer Vision and Communities of Experts. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/20DQ-Y220. https://resolver.caltech.edu/CaltechTHESIS:05082019-103122440 <https://resolver.caltech.edu/CaltechTHESIS:05082019-103122440> https://thesis.library.caltech.edu/11502/
collection NDLTD
format Others
sources NDLTD
description <p>Motivated by the idea of a Visipedia, where users can search and explore by image, this thesis presents tools and techniques for empowering expert communities through computer vision. The collective aim of this work is to provide a scalable foundation upon which an application like Visipedia can be built. We conduct experiments using two highly motivated communities, the birding community and the naturalist community, and report results and lessons on how to build the necessary components of a Visipedia. First, we conduct experiments analyzing the behavior of state-of-the-art computer vision classifiers on long tailed datasets. We find poor feature sharing between classes, potentially limiting the applicability of these models and emphasizing the ability to intelligently direct data collection resources. Second, we devise online crowdsourcing algorithms to make dataset collection for binary labels, multiclass labels, keypoints, and mulit-instance bounding boxes faster, cheaper, and more accurate. These methods jointly estimate labels, worker skills, and train computer vision models for these tasks. Experiments show that we can achieve significant cost savings compared to traditional data collection techniques, and that we can produce a more accurate dataset compared to traditional data collection techniques. Third, we present two fine-grained datasets, detail how they were constructed, and analyze the test accuracy of state-of-the-art methods. These datasets are then used to create applications that help users identify species in their photographs: Merlin, an app assisting users in identifying birds species, and iNaturalist, an app that assists users in identifying a broad variety of species. Finally, we present work aimed at reducing the computational burden of large scale classification with the goal of creating an application that allows users to classify tens of thousands of species in real time on their mobile device. As a whole, the lessons learned and the techniques presented in this thesis bring us closer to the realization of a Visipedia.</p>
author Van Horn, Grant Richard
spellingShingle Van Horn, Grant Richard
Towards a Visipedia: Combining Computer Vision and Communities of Experts
author_facet Van Horn, Grant Richard
author_sort Van Horn, Grant Richard
title Towards a Visipedia: Combining Computer Vision and Communities of Experts
title_short Towards a Visipedia: Combining Computer Vision and Communities of Experts
title_full Towards a Visipedia: Combining Computer Vision and Communities of Experts
title_fullStr Towards a Visipedia: Combining Computer Vision and Communities of Experts
title_full_unstemmed Towards a Visipedia: Combining Computer Vision and Communities of Experts
title_sort towards a visipedia: combining computer vision and communities of experts
publishDate 2019
url https://thesis.library.caltech.edu/11502/1/Towards_a_Visipedia__Tools_and_Techniques_for_Computer_Vision_Dataset_Collection%20%284%29.pdf
Van Horn, Grant Richard (2019) Towards a Visipedia: Combining Computer Vision and Communities of Experts. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/20DQ-Y220. https://resolver.caltech.edu/CaltechTHESIS:05082019-103122440 <https://resolver.caltech.edu/CaltechTHESIS:05082019-103122440>
work_keys_str_mv AT vanhorngrantrichard towardsavisipediacombiningcomputervisionandcommunitiesofexperts
_version_ 1719261407258607616