Learning Language-vision Correspondences
Given an unstructured collection of captioned images of cluttered scenes featuring a variety of objects, our goal is to simultaneously learn the names and appearances of the objects. Only a small fraction of local features within any given image are associated with a particular caption word, and cap...
Main Author: | |
---|---|
Other Authors: | |
Language: | en_ca |
Published: |
2010
|
Subjects: | |
Online Access: | http://hdl.handle.net/1807/26192 |