Toward Identifying Features for Automatic Gender Detection: A Corpus Creation and Analysis
The current paper aims to construct an inventory of stylometric and psychometric features for the automatic identification of the author's gender. These features are derived from an analysis of a manually developed Saudi Dialect Twitter Corpus (SDTwittC), consisting of four million words. Given...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2019-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8781684/ |
id |
doaj-d2286e4bf1094f4e88d73a99c7cd4674 |
---|---|
record_format |
Article |
spelling |
doaj-d2286e4bf1094f4e88d73a99c7cd46742021-04-05T17:22:02ZengIEEEIEEE Access2169-35362019-01-01711193111194310.1109/ACCESS.2019.29320268781684Toward Identifying Features for Automatic Gender Detection: A Corpus Creation and AnalysisSaad Awadh Alanazi0https://orcid.org/0000-0002-1714-1948Department of Computer Science, College of Computer and Information Sciences, Jouf University, Sakakah, Saudi ArabiaThe current paper aims to construct an inventory of stylometric and psychometric features for the automatic identification of the author's gender. These features are derived from an analysis of a manually developed Saudi Dialect Twitter Corpus (SDTwittC), consisting of four million words. Given that the study seeks to provide machine learning algorithms with the accurate set of features in solving the gender identification problem, word-based, character-based, syntactic, and function words are all considered during the selection stage. The word-based features constitute the largest category and they represent the possible gender discriminators from sociological, psychological and lexical perspectives. The results show that Saudi males use different styles that separate them from their female counterparts in terms of politeness (greeting, thanking, apology, congratulation, encouragement, best wishes etc), impoliteness (profanity and sarcasm), uses of intensifiers, hedges, color, emotion, reason, emoji among many others.https://ieeexplore.ieee.org/document/8781684/Automatic gender detectionfeature extractionSaudi dialects |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Saad Awadh Alanazi |
spellingShingle |
Saad Awadh Alanazi Toward Identifying Features for Automatic Gender Detection: A Corpus Creation and Analysis IEEE Access Automatic gender detection feature extraction Saudi dialects |
author_facet |
Saad Awadh Alanazi |
author_sort |
Saad Awadh Alanazi |
title |
Toward Identifying Features for Automatic Gender Detection: A Corpus Creation and Analysis |
title_short |
Toward Identifying Features for Automatic Gender Detection: A Corpus Creation and Analysis |
title_full |
Toward Identifying Features for Automatic Gender Detection: A Corpus Creation and Analysis |
title_fullStr |
Toward Identifying Features for Automatic Gender Detection: A Corpus Creation and Analysis |
title_full_unstemmed |
Toward Identifying Features for Automatic Gender Detection: A Corpus Creation and Analysis |
title_sort |
toward identifying features for automatic gender detection: a corpus creation and analysis |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2019-01-01 |
description |
The current paper aims to construct an inventory of stylometric and psychometric features for the automatic identification of the author's gender. These features are derived from an analysis of a manually developed Saudi Dialect Twitter Corpus (SDTwittC), consisting of four million words. Given that the study seeks to provide machine learning algorithms with the accurate set of features in solving the gender identification problem, word-based, character-based, syntactic, and function words are all considered during the selection stage. The word-based features constitute the largest category and they represent the possible gender discriminators from sociological, psychological and lexical perspectives. The results show that Saudi males use different styles that separate them from their female counterparts in terms of politeness (greeting, thanking, apology, congratulation, encouragement, best wishes etc), impoliteness (profanity and sarcasm), uses of intensifiers, hedges, color, emotion, reason, emoji among many others. |
topic |
Automatic gender detection feature extraction Saudi dialects |
url |
https://ieeexplore.ieee.org/document/8781684/ |
work_keys_str_mv |
AT saadawadhalanazi towardidentifyingfeaturesforautomaticgenderdetectionacorpuscreationandanalysis |
_version_ |
1721539845845680128 |