Towards a Visipedia: Combining Computer Vision and Communities of Experts
<p>Motivated by the idea of a Visipedia, where users can search and explore by image, this thesis presents tools and techniques for empowering expert communities through computer vision. The collective aim of this work is to provide a scalable foundation upon which an application like Visipedi...
id |
ndltd-CALTECH-oai-thesis.library.caltech.edu-11502 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-CALTECH-oai-thesis.library.caltech.edu-115022019-10-05T03:06:00Z Towards a Visipedia: Combining Computer Vision and Communities of Experts Van Horn, Grant Richard <p>Motivated by the idea of a Visipedia, where users can search and explore by image, this thesis presents tools and techniques for empowering expert communities through computer vision. The collective aim of this work is to provide a scalable foundation upon which an application like Visipedia can be built. We conduct experiments using two highly motivated communities, the birding community and the naturalist community, and report results and lessons on how to build the necessary components of a Visipedia. First, we conduct experiments analyzing the behavior of state-of-the-art computer vision classifiers on long tailed datasets. We find poor feature sharing between classes, potentially limiting the applicability of these models and emphasizing the ability to intelligently direct data collection resources. Second, we devise online crowdsourcing algorithms to make dataset collection for binary labels, multiclass labels, keypoints, and mulit-instance bounding boxes faster, cheaper, and more accurate. These methods jointly estimate labels, worker skills, and train computer vision models for these tasks. Experiments show that we can achieve significant cost savings compared to traditional data collection techniques, and that we can produce a more accurate dataset compared to traditional data collection techniques. Third, we present two fine-grained datasets, detail how they were constructed, and analyze the test accuracy of state-of-the-art methods. These datasets are then used to create applications that help users identify species in their photographs: Merlin, an app assisting users in identifying birds species, and iNaturalist, an app that assists users in identifying a broad variety of species. Finally, we present work aimed at reducing the computational burden of large scale classification with the goal of creating an application that allows users to classify tens of thousands of species in real time on their mobile device. As a whole, the lessons learned and the techniques presented in this thesis bring us closer to the realization of a Visipedia.</p> 2019 Thesis NonPeerReviewed application/pdf https://thesis.library.caltech.edu/11502/1/Towards_a_Visipedia__Tools_and_Techniques_for_Computer_Vision_Dataset_Collection%20%284%29.pdf https://resolver.caltech.edu/CaltechTHESIS:05082019-103122440 Van Horn, Grant Richard (2019) Towards a Visipedia: Combining Computer Vision and Communities of Experts. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/20DQ-Y220. https://resolver.caltech.edu/CaltechTHESIS:05082019-103122440 <https://resolver.caltech.edu/CaltechTHESIS:05082019-103122440> https://thesis.library.caltech.edu/11502/ |
collection |
NDLTD |
format |
Others
|
sources |
NDLTD |
description |
<p>Motivated by the idea of a Visipedia, where users can search and explore by image, this thesis presents tools and techniques for empowering expert communities through computer vision. The collective aim of this work is to provide a scalable foundation upon which an application like Visipedia can be built. We conduct experiments using two highly motivated communities, the birding community and the naturalist community, and report results and lessons on how to build the necessary components of a Visipedia. First, we conduct experiments analyzing the behavior of state-of-the-art computer vision classifiers on long tailed datasets. We find poor feature sharing between classes, potentially limiting the applicability of these models and emphasizing the ability to intelligently direct data collection resources. Second, we devise online crowdsourcing algorithms to make dataset collection for binary labels, multiclass labels, keypoints, and mulit-instance bounding boxes faster, cheaper, and more accurate. These methods jointly estimate labels, worker skills, and train computer vision models for these tasks. Experiments show that we can achieve significant cost savings compared to traditional data collection techniques, and that we can produce a more accurate dataset compared to traditional data collection techniques. Third, we present two fine-grained datasets, detail how they were constructed, and analyze the test accuracy of state-of-the-art methods. These datasets are then used to create applications that help users identify species in their photographs: Merlin, an app assisting users in identifying birds species, and iNaturalist, an app that assists users in identifying a broad variety of species. Finally, we present work aimed at reducing the computational burden of large scale classification with the goal of creating an application that allows users to classify tens of thousands of species in real time on their mobile device. As a whole, the lessons learned and the techniques presented in this thesis bring us closer to the realization of a Visipedia.</p> |
author |
Van Horn, Grant Richard |
spellingShingle |
Van Horn, Grant Richard Towards a Visipedia: Combining Computer Vision and Communities of Experts |
author_facet |
Van Horn, Grant Richard |
author_sort |
Van Horn, Grant Richard |
title |
Towards a Visipedia: Combining Computer Vision and Communities of Experts |
title_short |
Towards a Visipedia: Combining Computer Vision and Communities of Experts |
title_full |
Towards a Visipedia: Combining Computer Vision and Communities of Experts |
title_fullStr |
Towards a Visipedia: Combining Computer Vision and Communities of Experts |
title_full_unstemmed |
Towards a Visipedia: Combining Computer Vision and Communities of Experts |
title_sort |
towards a visipedia: combining computer vision and communities of experts |
publishDate |
2019 |
url |
https://thesis.library.caltech.edu/11502/1/Towards_a_Visipedia__Tools_and_Techniques_for_Computer_Vision_Dataset_Collection%20%284%29.pdf Van Horn, Grant Richard (2019) Towards a Visipedia: Combining Computer Vision and Communities of Experts. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/20DQ-Y220. https://resolver.caltech.edu/CaltechTHESIS:05082019-103122440 <https://resolver.caltech.edu/CaltechTHESIS:05082019-103122440> |
work_keys_str_mv |
AT vanhorngrantrichard towardsavisipediacombiningcomputervisionandcommunitiesofexperts |
_version_ |
1719261407258607616 |