Summary: | In this thesis, I examine human-like language generation from a visual input head-on, exploring how people refer to visible objects in the real world. Using previous work and the studies from this thesis, I propose an algorithm that generates humanlike reference to visible objects. Rather than introduce a general-purpose REG algorithm, as is tradition, I address the sorts of properties that visual domains in particular make available, and the ways that these must be processed in order to be used in a referring expression algorithm. This method uncovers several issues in generating human-like language that have not been thoroughly studied before. I focus on the properties of color, size, shape, and material, and address the issues of algorithm determinism and how speaker variation may be generated; unique identification of objects and whether this is an appropriate goal for generating humanlike reference; atypicality and the role it plays in reference; and multi-featured values for visual attributes. Technical contributions from this thesis include (1) an algorithm for generating size modifiers from features in a visual scene; and (2) a referring expression generation algorithm that generates structures for varied, human-like reference.
|