The domain agnostic generation of natural language explanations from provenance graphs

In a data-driven world, being able to record from where data was derived, and by whom is key. The way to represent this information, provenance, on the Web has been standardised by the World Wide Web Consortium as PROV. Furthermore, once provenance has been recorded, it is often necessary to be able...

Full description

Bibliographic Details
Main Author: Richardson, Darren Paul
Other Authors: Moreau, Luc ; Smart, Paul ; Shadbolt, Nigel ; Ramchurn, Sarvapali ; Costanza, Enrico ; Yang, Yang ; Popov, Igor ; Hall, Wendy ; Glaser, Hugh
Published: University of Southampton 2018
Online Access:https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.759231
id ndltd-bl.uk-oai-ethos.bl.uk-759231
record_format oai_dc
spelling ndltd-bl.uk-oai-ethos.bl.uk-7592312019-02-05T03:16:31ZThe domain agnostic generation of natural language explanations from provenance graphsRichardson, Darren PaulMoreau, Luc ; Smart, Paul ; Shadbolt, Nigel ; Ramchurn, Sarvapali ; Costanza, Enrico ; Yang, Yang ; Popov, Igor ; Hall, Wendy ; Glaser, Hugh2018In a data-driven world, being able to record from where data was derived, and by whom is key. The way to represent this information, provenance, on the Web has been standardised by the World Wide Web Consortium as PROV. Furthermore, once provenance has been recorded, it is often necessary to be able to present it back to users. In the state-of-the-art, the interfaces to such provenance tend to be diagrammatic, or rely on very application-specific template-based natural language generation. Both of these approaches have their drawbacks, motivating the search for techniques for generating natural language explanations from domain-generic provenance graphs. This work presents several contributions to the state-of-the-art in this regard. Firstly it presents a novel template-based architecture for natural language generation. This is followed by the novel application of set-cover optimisation techniques to the challenge of sentence selection. Thirdly, this work extends previous research into the role of URIs for lexicalising Linked Data resources, making use of the specific nature of PROV instance data to inform the heuristics used. Fourthly, these techniques are then evaluated in a user study demonstrating that they improve upon the state-of-the-art across the three dimensions of grammatical correctness, fluency, and comprehensibility. This evaluation also showed that the participants preferred the sentences generated using these techniques 56.4% of the time. Following on from these advances, an investigation is conducted into how to structure larger natural language explanations of provenance graphs. This is done by inviting a number of provenance experts to describe a sequence of provenance graphs presented diagrammatically, and analysing the way they approach this task. This reveals that the responses of the experts correlated strongly with the visual layout of the diagrams, and also that the experts were split as to whether to structure those explanations in a chronological or anti-chronological order. Finally, a further study was conducted to investigate how chronology affects the perceived quality of the generated natural language explanations, revealing that in aggregate the participants considered the chronological ordering to be more logical. This dissertation concludes with a summary of the contributions made to the state-of-the-art, as well as by proposing a number of possible areas for future research.University of Southamptonhttps://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.759231https://eprints.soton.ac.uk/423465/Electronic Thesis or Dissertation
collection NDLTD
sources NDLTD
description In a data-driven world, being able to record from where data was derived, and by whom is key. The way to represent this information, provenance, on the Web has been standardised by the World Wide Web Consortium as PROV. Furthermore, once provenance has been recorded, it is often necessary to be able to present it back to users. In the state-of-the-art, the interfaces to such provenance tend to be diagrammatic, or rely on very application-specific template-based natural language generation. Both of these approaches have their drawbacks, motivating the search for techniques for generating natural language explanations from domain-generic provenance graphs. This work presents several contributions to the state-of-the-art in this regard. Firstly it presents a novel template-based architecture for natural language generation. This is followed by the novel application of set-cover optimisation techniques to the challenge of sentence selection. Thirdly, this work extends previous research into the role of URIs for lexicalising Linked Data resources, making use of the specific nature of PROV instance data to inform the heuristics used. Fourthly, these techniques are then evaluated in a user study demonstrating that they improve upon the state-of-the-art across the three dimensions of grammatical correctness, fluency, and comprehensibility. This evaluation also showed that the participants preferred the sentences generated using these techniques 56.4% of the time. Following on from these advances, an investigation is conducted into how to structure larger natural language explanations of provenance graphs. This is done by inviting a number of provenance experts to describe a sequence of provenance graphs presented diagrammatically, and analysing the way they approach this task. This reveals that the responses of the experts correlated strongly with the visual layout of the diagrams, and also that the experts were split as to whether to structure those explanations in a chronological or anti-chronological order. Finally, a further study was conducted to investigate how chronology affects the perceived quality of the generated natural language explanations, revealing that in aggregate the participants considered the chronological ordering to be more logical. This dissertation concludes with a summary of the contributions made to the state-of-the-art, as well as by proposing a number of possible areas for future research.
author2 Moreau, Luc ; Smart, Paul ; Shadbolt, Nigel ; Ramchurn, Sarvapali ; Costanza, Enrico ; Yang, Yang ; Popov, Igor ; Hall, Wendy ; Glaser, Hugh
author_facet Moreau, Luc ; Smart, Paul ; Shadbolt, Nigel ; Ramchurn, Sarvapali ; Costanza, Enrico ; Yang, Yang ; Popov, Igor ; Hall, Wendy ; Glaser, Hugh
Richardson, Darren Paul
author Richardson, Darren Paul
spellingShingle Richardson, Darren Paul
The domain agnostic generation of natural language explanations from provenance graphs
author_sort Richardson, Darren Paul
title The domain agnostic generation of natural language explanations from provenance graphs
title_short The domain agnostic generation of natural language explanations from provenance graphs
title_full The domain agnostic generation of natural language explanations from provenance graphs
title_fullStr The domain agnostic generation of natural language explanations from provenance graphs
title_full_unstemmed The domain agnostic generation of natural language explanations from provenance graphs
title_sort domain agnostic generation of natural language explanations from provenance graphs
publisher University of Southampton
publishDate 2018
url https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.759231
work_keys_str_mv AT richardsondarrenpaul thedomainagnosticgenerationofnaturallanguageexplanationsfromprovenancegraphs
AT richardsondarrenpaul domainagnosticgenerationofnaturallanguageexplanationsfromprovenancegraphs
_version_ 1718972699827503104