WhatsApp usage patterns and prediction of demographic characteristics without access to message content

<b>Background</b>: Social networks on the Internet have become ubiquitous applications that allow people to easily share text, pictures, and audio and video files. Popular networks include WhatsApp, Facebook, Reddit, and LinkedIn. <b>Objective</b>: We present an extensive s...

Full description

Bibliographic Details
Main Authors: Avi Rosenfeld, Sigal Sina, David Sarne, Or Avidov, Sarit Kraus
Format: Article
Language:English
Published: Max Planck Institute for Demographic Research 2018-09-01
Series:Demographic Research
Subjects:
Online Access:https://www.demographic-research.org/volumes/vol39/22/
id doaj-48a7bb8569af4fcbb51eae664d568aff
record_format Article
spelling doaj-48a7bb8569af4fcbb51eae664d568aff2020-11-25T00:22:21ZengMax Planck Institute for Demographic ResearchDemographic Research1435-98712018-09-01392210.4054/DemRes.2018.39.223511WhatsApp usage patterns and prediction of demographic characteristics without access to message contentAvi Rosenfeld0Sigal Sina1David Sarne2Or Avidov3Sarit Kraus4Jerusalem College of TechnologyBar-Ilan UniversityBar-Ilan UniversityBar-Ilan UniversityBar-Ilan University<b>Background</b>: Social networks on the Internet have become ubiquitous applications that allow people to easily share text, pictures, and audio and video files. Popular networks include WhatsApp, Facebook, Reddit, and LinkedIn. <b>Objective</b>: We present an extensive study of the usage of the WhatsApp social network, an Internet messaging application that is quickly replacing SMS (short message service) messaging. To better understand people's use of the network, we provide an analysis of over 6 million encrypted messages from over 100 users, with the objective of building demographic prediction models that use activity data but not the content of these messages. <b>Methods</b>: We performed extensive statistical and numerical analysis of the data and found significant differences in WhatsApp usage across people of different genders and ages. We also entered the data into the Weka and pROC data mining packages and studied models created from decision trees, Bayesian networks, and logistic regression algorithms. <b>Results</b>: We found that different gender and age demographics had significantly different usage habits in almost all message and group attributes. We also noted differences in users' group behavior and created prediction models, including the likelihood that a given group would have relatively more file attachments and if a group would contain a larger number of participants, a higher frequency of activity, quicker response times, and shorter messages. <b>Conclusions</b>: We were successful in quantifying and predicting a user's gender and age demographic. Similarly, we were able to predict different types of group usage. All models were built without analyzing message content. <b>Contribution</b>: The main contribution of this paper is the ability to predict user demographics without having access to users' text content. We present a detailed discussion about the specific attributes that were contained in all predictive models and suggest possible applications based on these results.https://www.demographic-research.org/volumes/vol39/22/demographicssocial mediasocial networkusage predictionWhatsApp
collection DOAJ
language English
format Article
sources DOAJ
author Avi Rosenfeld
Sigal Sina
David Sarne
Or Avidov
Sarit Kraus
spellingShingle Avi Rosenfeld
Sigal Sina
David Sarne
Or Avidov
Sarit Kraus
WhatsApp usage patterns and prediction of demographic characteristics without access to message content
Demographic Research
demographics
social media
social network
usage prediction
WhatsApp
author_facet Avi Rosenfeld
Sigal Sina
David Sarne
Or Avidov
Sarit Kraus
author_sort Avi Rosenfeld
title WhatsApp usage patterns and prediction of demographic characteristics without access to message content
title_short WhatsApp usage patterns and prediction of demographic characteristics without access to message content
title_full WhatsApp usage patterns and prediction of demographic characteristics without access to message content
title_fullStr WhatsApp usage patterns and prediction of demographic characteristics without access to message content
title_full_unstemmed WhatsApp usage patterns and prediction of demographic characteristics without access to message content
title_sort whatsapp usage patterns and prediction of demographic characteristics without access to message content
publisher Max Planck Institute for Demographic Research
series Demographic Research
issn 1435-9871
publishDate 2018-09-01
description <b>Background</b>: Social networks on the Internet have become ubiquitous applications that allow people to easily share text, pictures, and audio and video files. Popular networks include WhatsApp, Facebook, Reddit, and LinkedIn. <b>Objective</b>: We present an extensive study of the usage of the WhatsApp social network, an Internet messaging application that is quickly replacing SMS (short message service) messaging. To better understand people's use of the network, we provide an analysis of over 6 million encrypted messages from over 100 users, with the objective of building demographic prediction models that use activity data but not the content of these messages. <b>Methods</b>: We performed extensive statistical and numerical analysis of the data and found significant differences in WhatsApp usage across people of different genders and ages. We also entered the data into the Weka and pROC data mining packages and studied models created from decision trees, Bayesian networks, and logistic regression algorithms. <b>Results</b>: We found that different gender and age demographics had significantly different usage habits in almost all message and group attributes. We also noted differences in users' group behavior and created prediction models, including the likelihood that a given group would have relatively more file attachments and if a group would contain a larger number of participants, a higher frequency of activity, quicker response times, and shorter messages. <b>Conclusions</b>: We were successful in quantifying and predicting a user's gender and age demographic. Similarly, we were able to predict different types of group usage. All models were built without analyzing message content. <b>Contribution</b>: The main contribution of this paper is the ability to predict user demographics without having access to users' text content. We present a detailed discussion about the specific attributes that were contained in all predictive models and suggest possible applications based on these results.
topic demographics
social media
social network
usage prediction
WhatsApp
url https://www.demographic-research.org/volumes/vol39/22/
work_keys_str_mv AT avirosenfeld whatsappusagepatternsandpredictionofdemographiccharacteristicswithoutaccesstomessagecontent
AT sigalsina whatsappusagepatternsandpredictionofdemographiccharacteristicswithoutaccesstomessagecontent
AT davidsarne whatsappusagepatternsandpredictionofdemographiccharacteristicswithoutaccesstomessagecontent
AT oravidov whatsappusagepatternsandpredictionofdemographiccharacteristicswithoutaccesstomessagecontent
AT saritkraus whatsappusagepatternsandpredictionofdemographiccharacteristicswithoutaccesstomessagecontent
_version_ 1725360129862270976