Demographics and Personality Discovery on Social Media: A Machine Learning Approach

This research proposes a new feature extraction algorithm using aggregated user engagements on social media in order to achieve demographics and personality discovery tasks. Our proposed framework can discover seven essential attributes, including gender identity, age group, residential area, educat...

Full description

Bibliographic Details
Main Authors: Sarach Tuomchomtam, Nuanwan Soonthornphisaj
Format: Article
Language:English
Published: MDPI AG 2021-08-01
Series:Information
Subjects:
Online Access:https://www.mdpi.com/2078-2489/12/9/353
id doaj-d97e91949ab24e3bbf2562a772729b2b
record_format Article
spelling doaj-d97e91949ab24e3bbf2562a772729b2b2021-09-26T00:26:25ZengMDPI AGInformation2078-24892021-08-011235335310.3390/info12090353Demographics and Personality Discovery on Social Media: A Machine Learning ApproachSarach Tuomchomtam0Nuanwan Soonthornphisaj1Artificial Intelligence and Knowledge Discovery Laboratory, Department of Computer Science, Faculty of Science, Kasetsart University, Bangkok 10900, ThailandArtificial Intelligence and Knowledge Discovery Laboratory, Department of Computer Science, Faculty of Science, Kasetsart University, Bangkok 10900, ThailandThis research proposes a new feature extraction algorithm using aggregated user engagements on social media in order to achieve demographics and personality discovery tasks. Our proposed framework can discover seven essential attributes, including gender identity, age group, residential area, education level, political affiliation, religious belief, and personality type. Multiple feature sets are developed, including comment text, community activity, and hybrid features. Various machine learning algorithms are explored, such as support vector machines, random forest, multi-layer perceptron, and naïve Bayes. An empirical analysis is performed on various aspects, including correctness, robustness, training time, and the class imbalance problem. We obtained the highest prediction performance by using our proposed feature extraction algorithm. The result on personality type prediction was 87.18%. For the demographic attribute prediction task, our feature sets also outperformed the baseline at 98.1% for residential area, 94.7% for education level, 92.1% for gender identity, 91.5% for political affiliation, 60.6% for religious belief, and 52.0% for the age group. Moreover, this paper provides the guideline for the choice of classifiers with appropriate feature sets.https://www.mdpi.com/2078-2489/12/9/353demographic attributespersonality predictionsocial mediamachine learning
collection DOAJ
language English
format Article
sources DOAJ
author Sarach Tuomchomtam
Nuanwan Soonthornphisaj
spellingShingle Sarach Tuomchomtam
Nuanwan Soonthornphisaj
Demographics and Personality Discovery on Social Media: A Machine Learning Approach
Information
demographic attributes
personality prediction
social media
machine learning
author_facet Sarach Tuomchomtam
Nuanwan Soonthornphisaj
author_sort Sarach Tuomchomtam
title Demographics and Personality Discovery on Social Media: A Machine Learning Approach
title_short Demographics and Personality Discovery on Social Media: A Machine Learning Approach
title_full Demographics and Personality Discovery on Social Media: A Machine Learning Approach
title_fullStr Demographics and Personality Discovery on Social Media: A Machine Learning Approach
title_full_unstemmed Demographics and Personality Discovery on Social Media: A Machine Learning Approach
title_sort demographics and personality discovery on social media: a machine learning approach
publisher MDPI AG
series Information
issn 2078-2489
publishDate 2021-08-01
description This research proposes a new feature extraction algorithm using aggregated user engagements on social media in order to achieve demographics and personality discovery tasks. Our proposed framework can discover seven essential attributes, including gender identity, age group, residential area, education level, political affiliation, religious belief, and personality type. Multiple feature sets are developed, including comment text, community activity, and hybrid features. Various machine learning algorithms are explored, such as support vector machines, random forest, multi-layer perceptron, and naïve Bayes. An empirical analysis is performed on various aspects, including correctness, robustness, training time, and the class imbalance problem. We obtained the highest prediction performance by using our proposed feature extraction algorithm. The result on personality type prediction was 87.18%. For the demographic attribute prediction task, our feature sets also outperformed the baseline at 98.1% for residential area, 94.7% for education level, 92.1% for gender identity, 91.5% for political affiliation, 60.6% for religious belief, and 52.0% for the age group. Moreover, this paper provides the guideline for the choice of classifiers with appropriate feature sets.
topic demographic attributes
personality prediction
social media
machine learning
url https://www.mdpi.com/2078-2489/12/9/353
work_keys_str_mv AT sarachtuomchomtam demographicsandpersonalitydiscoveryonsocialmediaamachinelearningapproach
AT nuanwansoonthornphisaj demographicsandpersonalitydiscoveryonsocialmediaamachinelearningapproach
_version_ 1717366193098063872