Predictive modeling for trustworthiness and other subjective text properties in online nutrition and health communication.

While the internet has democratized and accelerated content creation and sharing, it has also made people more vulnerable to manipulation and misinformation. Also, the received information can be distorted by psychological biases. This is problematic especially in health-related communications which...

Full description

Bibliographic Details
Main Authors: Janne Kauttonen, Jenni Hannukainen, Pia Tikka, Jyrki Suomala
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2020-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0237144
id doaj-35cc27576cb54bf7b74608ae0652c4b1
record_format Article
spelling doaj-35cc27576cb54bf7b74608ae0652c4b12021-03-03T22:01:45ZengPublic Library of Science (PLoS)PLoS ONE1932-62032020-01-01158e023714410.1371/journal.pone.0237144Predictive modeling for trustworthiness and other subjective text properties in online nutrition and health communication.Janne KauttonenJenni HannukainenPia TikkaJyrki SuomalaWhile the internet has democratized and accelerated content creation and sharing, it has also made people more vulnerable to manipulation and misinformation. Also, the received information can be distorted by psychological biases. This is problematic especially in health-related communications which can greatly affect the quality of life of individuals. We assembled and analyzed 364 texts related to nutrition and health from Finnish online sources, such as news, columns and blogs, and asked non-experts to subjectively evaluate the texts. Texts were rated for their trustworthiness, sentiment, logic, information, clarity, and neutrality properties. We then estimated individual biases and consensus ratings that were used in training regression models. Firstly, we found that trustworthiness was significantly correlated to the information, neutrality and logic of the texts. Secondly, individual ratings for information and logic were significantly biased by the age and diet of the raters. Our best regression models explained up to 70% of the total variance of consensus ratings based on the low-level properties of texts, such as semantic embeddings, presence of key-terms and part-of-speech tags, references, quotes and paragraphs. With a novel combination of crowdsourcing, behavioral analysis, natural language processing and predictive modeling, our study contributes to the automated identification of reliable and high-quality online information. While critical evaluation of truthfulness cannot be surrendered to the machine only, our findings provide new insights into automated evaluation of subjective text properties and analysis of morphologically-rich languages in regards to trustworthiness.https://doi.org/10.1371/journal.pone.0237144
collection DOAJ
language English
format Article
sources DOAJ
author Janne Kauttonen
Jenni Hannukainen
Pia Tikka
Jyrki Suomala
spellingShingle Janne Kauttonen
Jenni Hannukainen
Pia Tikka
Jyrki Suomala
Predictive modeling for trustworthiness and other subjective text properties in online nutrition and health communication.
PLoS ONE
author_facet Janne Kauttonen
Jenni Hannukainen
Pia Tikka
Jyrki Suomala
author_sort Janne Kauttonen
title Predictive modeling for trustworthiness and other subjective text properties in online nutrition and health communication.
title_short Predictive modeling for trustworthiness and other subjective text properties in online nutrition and health communication.
title_full Predictive modeling for trustworthiness and other subjective text properties in online nutrition and health communication.
title_fullStr Predictive modeling for trustworthiness and other subjective text properties in online nutrition and health communication.
title_full_unstemmed Predictive modeling for trustworthiness and other subjective text properties in online nutrition and health communication.
title_sort predictive modeling for trustworthiness and other subjective text properties in online nutrition and health communication.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2020-01-01
description While the internet has democratized and accelerated content creation and sharing, it has also made people more vulnerable to manipulation and misinformation. Also, the received information can be distorted by psychological biases. This is problematic especially in health-related communications which can greatly affect the quality of life of individuals. We assembled and analyzed 364 texts related to nutrition and health from Finnish online sources, such as news, columns and blogs, and asked non-experts to subjectively evaluate the texts. Texts were rated for their trustworthiness, sentiment, logic, information, clarity, and neutrality properties. We then estimated individual biases and consensus ratings that were used in training regression models. Firstly, we found that trustworthiness was significantly correlated to the information, neutrality and logic of the texts. Secondly, individual ratings for information and logic were significantly biased by the age and diet of the raters. Our best regression models explained up to 70% of the total variance of consensus ratings based on the low-level properties of texts, such as semantic embeddings, presence of key-terms and part-of-speech tags, references, quotes and paragraphs. With a novel combination of crowdsourcing, behavioral analysis, natural language processing and predictive modeling, our study contributes to the automated identification of reliable and high-quality online information. While critical evaluation of truthfulness cannot be surrendered to the machine only, our findings provide new insights into automated evaluation of subjective text properties and analysis of morphologically-rich languages in regards to trustworthiness.
url https://doi.org/10.1371/journal.pone.0237144
work_keys_str_mv AT jannekauttonen predictivemodelingfortrustworthinessandothersubjectivetextpropertiesinonlinenutritionandhealthcommunication
AT jennihannukainen predictivemodelingfortrustworthinessandothersubjectivetextpropertiesinonlinenutritionandhealthcommunication
AT piatikka predictivemodelingfortrustworthinessandothersubjectivetextpropertiesinonlinenutritionandhealthcommunication
AT jyrkisuomala predictivemodelingfortrustworthinessandothersubjectivetextpropertiesinonlinenutritionandhealthcommunication
_version_ 1714813802484072448