The domain agnostic generation of natural language explanations from provenance graphs

In a data-driven world, being able to record from where data was derived, and by whom is key. The way to represent this information, provenance, on the Web has been standardised by the World Wide Web Consortium as PROV. Furthermore, once provenance has been recorded, it is often necessary to be able...

Full description

Bibliographic Details
Main Author:	Richardson, Darren Paul
Other Authors:	Moreau, Luc ; Smart, Paul ; Shadbolt, Nigel ; Ramchurn, Sarvapali ; Costanza, Enrico ; Yang, Yang ; Popov, Igor ; Hall, Wendy ; Glaser, Hugh
Published:	University of Southampton 2018
Online Access:	https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.759231

id	ndltd-bl.uk-oai-ethos.bl.uk-759231
record_format	oai_dc
spelling	ndltd-bl.uk-oai-ethos.bl.uk-7592312019-02-05T03:16:31ZThe domain agnostic generation of natural language explanations from provenance graphsRichardson, Darren PaulMoreau, Luc ; Smart, Paul ; Shadbolt, Nigel ; Ramchurn, Sarvapali ; Costanza, Enrico ; Yang, Yang ; Popov, Igor ; Hall, Wendy ; Glaser, Hugh2018In a data-driven world, being able to record from where data was derived, and by whom is key. The way to represent this information, provenance, on the Web has been standardised by the World Wide Web Consortium as PROV. Furthermore, once provenance has been recorded, it is often necessary to be able to present it back to users. In the state-of-the-art, the interfaces to such provenance tend to be diagrammatic, or rely on very application-specific template-based natural language generation. Both of these approaches have their drawbacks, motivating the search for techniques for generating natural language explanations from domain-generic provenance graphs. This work presents several contributions to the state-of-the-art in this regard. Firstly it presents a novel template-based architecture for natural language generation. This is followed by the novel application of set-cover optimisation techniques to the challenge of sentence selection. Thirdly, this work extends previous research into the role of URIs for lexicalising Linked Data resources, making use of the specific nature of PROV instance data to inform the heuristics used. Fourthly, these techniques are then evaluated in a user study demonstrating that they improve upon the state-of-the-art across the three dimensions of grammatical correctness, fluency, and comprehensibility. This evaluation also showed that the participants preferred the sentences generated using these techniques 56.4% of the time. Following on from these advances, an investigation is conducted into how to structure larger natural language explanations of provenance graphs. This is done by inviting a number of provenance experts to describe a sequence of provenance graphs presented diagrammatically, and analysing the way they approach this task. This reveals that the responses of the experts correlated strongly with the visual layout of the diagrams, and also that the experts were split as to whether to structure those explanations in a chronological or anti-chronological order. Finally, a further study was conducted to investigate how chronology affects the perceived quality of the generated natural language explanations, revealing that in aggregate the participants considered the chronological ordering to be more logical. This dissertation concludes with a summary of the contributions made to the state-of-the-art, as well as by proposing a number of possible areas for future research.University of Southamptonhttps://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.759231https://eprints.soton.ac.uk/423465/Electronic Thesis or Dissertation
collection	NDLTD
sources	NDLTD
description	In a data-driven world, being able to record from where data was derived, and by whom is key. The way to represent this information, provenance, on the Web has been standardised by the World Wide Web Consortium as PROV. Furthermore, once provenance has been recorded, it is often necessary to be able to present it back to users. In the state-of-the-art, the interfaces to such provenance tend to be diagrammatic, or rely on very application-specific template-based natural language generation. Both of these approaches have their drawbacks, motivating the search for techniques for generating natural language explanations from domain-generic provenance graphs. This work presents several contributions to the state-of-the-art in this regard. Firstly it presents a novel template-based architecture for natural language generation. This is followed by the novel application of set-cover optimisation techniques to the challenge of sentence selection. Thirdly, this work extends previous research into the role of URIs for lexicalising Linked Data resources, making use of the specific nature of PROV instance data to inform the heuristics used. Fourthly, these techniques are then evaluated in a user study demonstrating that they improve upon the state-of-the-art across the three dimensions of grammatical correctness, fluency, and comprehensibility. This evaluation also showed that the participants preferred the sentences generated using these techniques 56.4% of the time. Following on from these advances, an investigation is conducted into how to structure larger natural language explanations of provenance graphs. This is done by inviting a number of provenance experts to describe a sequence of provenance graphs presented diagrammatically, and analysing the way they approach this task. This reveals that the responses of the experts correlated strongly with the visual layout of the diagrams, and also that the experts were split as to whether to structure those explanations in a chronological or anti-chronological order. Finally, a further study was conducted to investigate how chronology affects the perceived quality of the generated natural language explanations, revealing that in aggregate the participants considered the chronological ordering to be more logical. This dissertation concludes with a summary of the contributions made to the state-of-the-art, as well as by proposing a number of possible areas for future research.
author2	Moreau, Luc ; Smart, Paul ; Shadbolt, Nigel ; Ramchurn, Sarvapali ; Costanza, Enrico ; Yang, Yang ; Popov, Igor ; Hall, Wendy ; Glaser, Hugh
author_facet	Moreau, Luc ; Smart, Paul ; Shadbolt, Nigel ; Ramchurn, Sarvapali ; Costanza, Enrico ; Yang, Yang ; Popov, Igor ; Hall, Wendy ; Glaser, Hugh Richardson, Darren Paul
author	Richardson, Darren Paul
spellingShingle	Richardson, Darren Paul The domain agnostic generation of natural language explanations from provenance graphs
author_sort	Richardson, Darren Paul
title	The domain agnostic generation of natural language explanations from provenance graphs
title_short	The domain agnostic generation of natural language explanations from provenance graphs
title_full	The domain agnostic generation of natural language explanations from provenance graphs
title_fullStr	The domain agnostic generation of natural language explanations from provenance graphs
title_full_unstemmed	The domain agnostic generation of natural language explanations from provenance graphs
title_sort	domain agnostic generation of natural language explanations from provenance graphs
publisher	University of Southampton
publishDate	2018
url	https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.759231
work_keys_str_mv	AT richardsondarrenpaul thedomainagnosticgenerationofnaturallanguageexplanationsfromprovenancegraphs AT richardsondarrenpaul domainagnosticgenerationofnaturallanguageexplanationsfromprovenancegraphs
_version_	1718972699827503104

The domain agnostic generation of natural language explanations from provenance graphs

Similar Items