Augmenting Statistical Data Dissemination by Short Quantified Sentences of Natural Language
Data from National Statistical Institutes is generally considered an important source of credible evidence for a variety of users. Summarization and dissemination via traditional methods is a convenient approach for providing this evidence. However, this is usually comprehensible only for users with...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Sciendo
2018-12-01
|
Series: | Journal of Official Statistics |
Subjects: | |
Online Access: | https://doi.org/10.2478/jos-2018-0048 |
id |
doaj-54941a3260d449e9b15e138ba10b6d04 |
---|---|
record_format |
Article |
spelling |
doaj-54941a3260d449e9b15e138ba10b6d042021-09-06T19:41:47ZengSciendoJournal of Official Statistics2001-73672018-12-01344981101010.2478/jos-2018-0048jos-2018-0048Augmenting Statistical Data Dissemination by Short Quantified Sentences of Natural LanguageHudec Miroslav0Bednárová Erika1Holzinger Andreas2Faculty of Economic Informatics, University of Economics in Bratislava, Dolnozemská cesta 1, 852 35Bratislava, Slovakia.Faculty of Economic Informatics, University of Economics in Bratislava, Dolnozemská cesta 1, 852 35Bratislava, Slovakia.Holzinger Group HCI-KDD, Institute for Medical Informatics, Statistics and Documentation, Medical University Graz, Auenbruggerplatz 2, 8036Graz, Austria.Data from National Statistical Institutes is generally considered an important source of credible evidence for a variety of users. Summarization and dissemination via traditional methods is a convenient approach for providing this evidence. However, this is usually comprehensible only for users with a considerable level of statistical literacy. A promising alternative lies in augmenting the summarization linguistically. Less statistically literate users (e.g., domain experts and the general public), as well as disabled people can benefit from such a summarization. This article studies the potential of summaries expressed in short quantified sentences. Summaries including, for example, “most visits from remote countries are of a short duration” can be immediately understood by diverse users. Linguistic summaries are not intended to replace existing dissemination approaches, but can augment them by providing alternatives for the benefit of diverse users of official statistics. Linguistic summarization can be achieved via mathematical formalization of linguistic terms and relative quantifiers by fuzzy sets. To avoid summaries based on outliers or data with low coverage, a quality criterion is applied. The concept based on linguistic summaries is demonstrated on test interfaces, interpreting summaries from real municipal statistical data. The article identifies a number of further research opportunities, and demonstrates ways to explore those.https://doi.org/10.2478/jos-2018-0048linguistic summarieslinguistic quantifiersfuzzy setsdatabase queriesuser interface |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Hudec Miroslav Bednárová Erika Holzinger Andreas |
spellingShingle |
Hudec Miroslav Bednárová Erika Holzinger Andreas Augmenting Statistical Data Dissemination by Short Quantified Sentences of Natural Language Journal of Official Statistics linguistic summaries linguistic quantifiers fuzzy sets database queries user interface |
author_facet |
Hudec Miroslav Bednárová Erika Holzinger Andreas |
author_sort |
Hudec Miroslav |
title |
Augmenting Statistical Data Dissemination by Short Quantified Sentences of Natural Language |
title_short |
Augmenting Statistical Data Dissemination by Short Quantified Sentences of Natural Language |
title_full |
Augmenting Statistical Data Dissemination by Short Quantified Sentences of Natural Language |
title_fullStr |
Augmenting Statistical Data Dissemination by Short Quantified Sentences of Natural Language |
title_full_unstemmed |
Augmenting Statistical Data Dissemination by Short Quantified Sentences of Natural Language |
title_sort |
augmenting statistical data dissemination by short quantified sentences of natural language |
publisher |
Sciendo |
series |
Journal of Official Statistics |
issn |
2001-7367 |
publishDate |
2018-12-01 |
description |
Data from National Statistical Institutes is generally considered an important source of credible evidence for a variety of users. Summarization and dissemination via traditional methods is a convenient approach for providing this evidence. However, this is usually comprehensible only for users with a considerable level of statistical literacy. A promising alternative lies in augmenting the summarization linguistically. Less statistically literate users (e.g., domain experts and the general public), as well as disabled people can benefit from such a summarization. This article studies the potential of summaries expressed in short quantified sentences. Summaries including, for example, “most visits from remote countries are of a short duration” can be immediately understood by diverse users. Linguistic summaries are not intended to replace existing dissemination approaches, but can augment them by providing alternatives for the benefit of diverse users of official statistics. Linguistic summarization can be achieved via mathematical formalization of linguistic terms and relative quantifiers by fuzzy sets. To avoid summaries based on outliers or data with low coverage, a quality criterion is applied. The concept based on linguistic summaries is demonstrated on test interfaces, interpreting summaries from real municipal statistical data. The article identifies a number of further research opportunities, and demonstrates ways to explore those. |
topic |
linguistic summaries linguistic quantifiers fuzzy sets database queries user interface |
url |
https://doi.org/10.2478/jos-2018-0048 |
work_keys_str_mv |
AT hudecmiroslav augmentingstatisticaldatadisseminationbyshortquantifiedsentencesofnaturallanguage AT bednarovaerika augmentingstatisticaldatadisseminationbyshortquantifiedsentencesofnaturallanguage AT holzingerandreas augmentingstatisticaldatadisseminationbyshortquantifiedsentencesofnaturallanguage |
_version_ |
1717765437402382336 |