Summary: | Data-to-text systems are Natural Language Generation (NLG) systems that generate textural summaries of raw numerical data. To date such systems have concentrated exclusively on time series data. This is despite the increasing use and availability of low cost Geographical Information Systems (GIS), which has made analysis of georeferenced data commonplace in many scientific areas. This thesis describes original research in the field of NLG by addressing the problem of automatically generating textual summaries of georeferenced data; that is, any data that has a reference to its location on the Earth’s surface. The postulation that data-to-text technology can generate textural summaries of georeferenced data of comparable quality to human written ones for the same data set provides its focus. This research has carried out in the context of the RoadSafe project, whose primary outcome was development of a date-to-text application for generating road maintenance weather forecasts. This thesis is a thorough investigation of the practical and theoretical issues involved in generating good quality textural summaries of georeferenced data. It begins by surveying the current state of the art in data-to-text and the challenges that georeferenced data poses to such systems. Subsequently empirical observations are outlined that lead to the proposal of a model for georeferenced data-to-text. This model has been implemented and evaluated in a system fielded in the meteorology domain. Techniques for data analysis, content determination and generating spatial descriptions are outlined.
|