Interactive Visualizations of Natural Language

While linguistic skill is a hallmark of humanity, the increasing volume of linguistic data each of us faces is causing individual and societal problems — ‘information overload’ is a commonly discussed condition. Tasks such as finding the most appropriate information online, understanding the content...

Full description

Bibliographic Details
Main Author: Collins, Christopher
Other Authors: Penn, Gerald
Language:en_ca
Published: 2010
Subjects:
Online Access:http://hdl.handle.net/1807/24726
id ndltd-LACETR-oai-collectionscanada.gc.ca-OTU.1807-24726
record_format oai_dc
spelling ndltd-LACETR-oai-collectionscanada.gc.ca-OTU.1807-247262013-04-17T04:18:17ZInteractive Visualizations of Natural LanguageCollins, Christophervisualizationlinguisticsnatural language processinginformation visualizationmachine translation09840723While linguistic skill is a hallmark of humanity, the increasing volume of linguistic data each of us faces is causing individual and societal problems — ‘information overload’ is a commonly discussed condition. Tasks such as finding the most appropriate information online, understanding the contents of a personal email repository, and translating documents from another language are now commonplace. These tasks need not cause stress and feelings of overload: the human intellectual capacity is not the problem. Rather, the computational interfaces to linguistic data are problematic — there exists a Linguistic Visualization Divide in the current state-of-the-art. Through five design studies, this dissertation combines sophisticated natural language processing algorithms with information visualization techniques grounded in evidence of human visuospatial capabilities. The first design study, Uncertainty Lattices, augments real-time computermediated communication, such as cross-language instant messaging chat and automatic speech recognition. By providing explicit indications of algorithmic confidence, the visualization enables informed decisions about the quality of computational outputs. Two design studies explore the space of content analysis. DocuBurst is an interactive visualization of document content, which spatially organizes words using an expert-created ontology. Broadening from single documents to document collections, Parallel Tag Clouds combine keyword extraction and coordinated visualizations to provide comparative overviews across subsets of a faceted text corpus. Finally, two studies address visualization for natural language processing research. The Bubble Sets visualization draws secondary set relations around arbitrary collections of items, such as a linguistic parse tree. From this design study we propose a theory of spatial rights to consider when assigning visual encodings to data. Expanding considerations of spatial rights, we present a formalism to organize the variety of approaches to coordinated and linked visualization, and introduce VisLink, a new method to relate and explore multiple 2d visualizations in 3d space. Intervisualization connections allow for cross-visualization queries and support high level comparison between visualizations. From the design studies we distill challenges common to visualizing language data, including maintaining legibility, supporting detailed reading, addressing data scale challenges, and managing problems arising from semantic ambiguity.Penn, GeraldCarpendale, Sheelagh2010-062010-08-06T18:26:41ZNO_RESTRICTION2010-08-06T18:26:41Z2010-08-06T18:26:41ZThesishttp://hdl.handle.net/1807/24726en_ca
collection NDLTD
language en_ca
sources NDLTD
topic visualization
linguistics
natural language processing
information visualization
machine translation
0984
0723
spellingShingle visualization
linguistics
natural language processing
information visualization
machine translation
0984
0723
Collins, Christopher
Interactive Visualizations of Natural Language
description While linguistic skill is a hallmark of humanity, the increasing volume of linguistic data each of us faces is causing individual and societal problems — ‘information overload’ is a commonly discussed condition. Tasks such as finding the most appropriate information online, understanding the contents of a personal email repository, and translating documents from another language are now commonplace. These tasks need not cause stress and feelings of overload: the human intellectual capacity is not the problem. Rather, the computational interfaces to linguistic data are problematic — there exists a Linguistic Visualization Divide in the current state-of-the-art. Through five design studies, this dissertation combines sophisticated natural language processing algorithms with information visualization techniques grounded in evidence of human visuospatial capabilities. The first design study, Uncertainty Lattices, augments real-time computermediated communication, such as cross-language instant messaging chat and automatic speech recognition. By providing explicit indications of algorithmic confidence, the visualization enables informed decisions about the quality of computational outputs. Two design studies explore the space of content analysis. DocuBurst is an interactive visualization of document content, which spatially organizes words using an expert-created ontology. Broadening from single documents to document collections, Parallel Tag Clouds combine keyword extraction and coordinated visualizations to provide comparative overviews across subsets of a faceted text corpus. Finally, two studies address visualization for natural language processing research. The Bubble Sets visualization draws secondary set relations around arbitrary collections of items, such as a linguistic parse tree. From this design study we propose a theory of spatial rights to consider when assigning visual encodings to data. Expanding considerations of spatial rights, we present a formalism to organize the variety of approaches to coordinated and linked visualization, and introduce VisLink, a new method to relate and explore multiple 2d visualizations in 3d space. Intervisualization connections allow for cross-visualization queries and support high level comparison between visualizations. From the design studies we distill challenges common to visualizing language data, including maintaining legibility, supporting detailed reading, addressing data scale challenges, and managing problems arising from semantic ambiguity.
author2 Penn, Gerald
author_facet Penn, Gerald
Collins, Christopher
author Collins, Christopher
author_sort Collins, Christopher
title Interactive Visualizations of Natural Language
title_short Interactive Visualizations of Natural Language
title_full Interactive Visualizations of Natural Language
title_fullStr Interactive Visualizations of Natural Language
title_full_unstemmed Interactive Visualizations of Natural Language
title_sort interactive visualizations of natural language
publishDate 2010
url http://hdl.handle.net/1807/24726
work_keys_str_mv AT collinschristopher interactivevisualizationsofnaturallanguage
_version_ 1716580377282215936