Ensemble Methods for Instance-Based Arabic Language Authorship Attribution
The Authorship Attribution (AA) is considered as a subfield of authorship analysis and it is an important problem as the range of anonymous information increased with fast-growing of internet usage worldwide. In other languages such as English, Spanish and Chinese, such issue is quite well studied....
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2020-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8952685/ |
id |
doaj-3737019e0bff4cc5bf0618f1401ae754 |
---|---|
record_format |
Article |
spelling |
doaj-3737019e0bff4cc5bf0618f1401ae7542021-03-30T02:52:05ZengIEEEIEEE Access2169-35362020-01-018173311734510.1109/ACCESS.2020.29649528952685Ensemble Methods for Instance-Based Arabic Language Authorship AttributionMohammed Al-Sarem0https://orcid.org/0000-0001-7172-8224Faisal Saeed1https://orcid.org/0000-0002-2822-1708Abdullah Alsaeedi2https://orcid.org/0000-0002-7974-7638Wadii Boulila3https://orcid.org/0000-0003-2133-0757Tawfik Al-Hadhrami4https://orcid.org/0000-0001-7441-604XCollege of Computer Science and Engineering, Taibah University, Medina, Saudi ArabiaCollege of Computer Science and Engineering, Taibah University, Medina, Saudi ArabiaCollege of Computer Science and Engineering, Taibah University, Medina, Saudi ArabiaCollege of Computer Science and Engineering, Taibah University, Medina, Saudi ArabiaSchool of Science and Technology, Nottingham Trent University, Nottingham, U.K.The Authorship Attribution (AA) is considered as a subfield of authorship analysis and it is an important problem as the range of anonymous information increased with fast-growing of internet usage worldwide. In other languages such as English, Spanish and Chinese, such issue is quite well studied. However, in the Arabic language, the AA problem has received less attention from the research community due to the complexity and nature of Arabic sentences. The paper presented an intensive review of previous studies for Arabic language. Based on that, this study has employed the Technique for Order Preferences by Similarity to Ideal Solution (TOPSIS) method to choose the base classifier of the ensemble methods. In terms of attribution features, hundreds of stylometric features and distinct words using several tools have been extracted. Then, AdaBoost and Bagging ensemble methods have been applied to Arabic enquires (Fatwa) dataset. The findings showed an improvement of the effectiveness of the authorship attribution task in the Arabic language.https://ieeexplore.ieee.org/document/8952685/Authorship attributionensemble methodsstylometric featuresTOPSIS method |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Mohammed Al-Sarem Faisal Saeed Abdullah Alsaeedi Wadii Boulila Tawfik Al-Hadhrami |
spellingShingle |
Mohammed Al-Sarem Faisal Saeed Abdullah Alsaeedi Wadii Boulila Tawfik Al-Hadhrami Ensemble Methods for Instance-Based Arabic Language Authorship Attribution IEEE Access Authorship attribution ensemble methods stylometric features TOPSIS method |
author_facet |
Mohammed Al-Sarem Faisal Saeed Abdullah Alsaeedi Wadii Boulila Tawfik Al-Hadhrami |
author_sort |
Mohammed Al-Sarem |
title |
Ensemble Methods for Instance-Based Arabic Language Authorship Attribution |
title_short |
Ensemble Methods for Instance-Based Arabic Language Authorship Attribution |
title_full |
Ensemble Methods for Instance-Based Arabic Language Authorship Attribution |
title_fullStr |
Ensemble Methods for Instance-Based Arabic Language Authorship Attribution |
title_full_unstemmed |
Ensemble Methods for Instance-Based Arabic Language Authorship Attribution |
title_sort |
ensemble methods for instance-based arabic language authorship attribution |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2020-01-01 |
description |
The Authorship Attribution (AA) is considered as a subfield of authorship analysis and it is an important problem as the range of anonymous information increased with fast-growing of internet usage worldwide. In other languages such as English, Spanish and Chinese, such issue is quite well studied. However, in the Arabic language, the AA problem has received less attention from the research community due to the complexity and nature of Arabic sentences. The paper presented an intensive review of previous studies for Arabic language. Based on that, this study has employed the Technique for Order Preferences by Similarity to Ideal Solution (TOPSIS) method to choose the base classifier of the ensemble methods. In terms of attribution features, hundreds of stylometric features and distinct words using several tools have been extracted. Then, AdaBoost and Bagging ensemble methods have been applied to Arabic enquires (Fatwa) dataset. The findings showed an improvement of the effectiveness of the authorship attribution task in the Arabic language. |
topic |
Authorship attribution ensemble methods stylometric features TOPSIS method |
url |
https://ieeexplore.ieee.org/document/8952685/ |
work_keys_str_mv |
AT mohammedalsarem ensemblemethodsforinstancebasedarabiclanguageauthorshipattribution AT faisalsaeed ensemblemethodsforinstancebasedarabiclanguageauthorshipattribution AT abdullahalsaeedi ensemblemethodsforinstancebasedarabiclanguageauthorshipattribution AT wadiiboulila ensemblemethodsforinstancebasedarabiclanguageauthorshipattribution AT tawfikalhadhrami ensemblemethodsforinstancebasedarabiclanguageauthorshipattribution |
_version_ |
1724184396852887552 |