An Arabic CCG approach for determining constituent types from Arabic Treebank

Converting a treebank into a CCGbank opens the respective language to the sophisticated tools developed for Combinatory Categorial Grammar (CCG) and enriches cross-linguistic development. The conversion is primarily a three-step process: determining constituents’ types, binarization, and category co...

Full description

Bibliographic Details
Main Authors: Ahmed I. El-taher, Hitahm M. Abo Bakr, Ibrahim Zidan, Khaled Shaalan
Format: Article
Language:English
Published: Elsevier 2014-12-01
Series:Journal of King Saud University: Computer and Information Sciences
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1319157814000299
id doaj-5bcff5998ade41f7bf6bd4180f7523b7
record_format Article
spelling doaj-5bcff5998ade41f7bf6bd4180f7523b72020-11-24T21:42:16ZengElsevierJournal of King Saud University: Computer and Information Sciences1319-15782014-12-0126444144910.1016/j.jksuci.2014.06.005An Arabic CCG approach for determining constituent types from Arabic TreebankAhmed I. El-taher0Hitahm M. Abo Bakr1Ibrahim Zidan2Khaled Shaalan3Derpartment of Computer and System Engineering, Faculty of Engineering, Zagazig University, Zagazig, Asharkia, EgyptDerpartment of Computer and System Engineering, Faculty of Engineering, Zagazig University, Zagazig, Asharkia, EgyptDerpartment of Computer and System Engineering, Faculty of Engineering, Zagazig University, Zagazig, Asharkia, EgyptThe British University, Dubai, United Arab EmiratesConverting a treebank into a CCGbank opens the respective language to the sophisticated tools developed for Combinatory Categorial Grammar (CCG) and enriches cross-linguistic development. The conversion is primarily a three-step process: determining constituents’ types, binarization, and category conversion. Usually, this process involves a preprocessing step to the Treebank of choice for correcting brackets and normalizing tags for any changes that were introduced during the manual annotation, as well as extracting morpho-syntactic information that is necessary for determining constituents’ types. In this article, we describe the required preprocessing step on the Arabic Treebank, as well as how to determine Arabic constituents’ types. We conducted an experiment on parts 1 and 2 of the Penn Arabic Treebank (PATB) aimed at converting the PATB into an Arabic CCGbank. The performance of our algorithm when applied to ATB1v2.0 & ATB2v2.0 was 99% identification of head nodes and 100% coverage over the Treebank data.http://www.sciencedirect.com/science/article/pii/S1319157814000299ArabicCCGbankTreebank
collection DOAJ
language English
format Article
sources DOAJ
author Ahmed I. El-taher
Hitahm M. Abo Bakr
Ibrahim Zidan
Khaled Shaalan
spellingShingle Ahmed I. El-taher
Hitahm M. Abo Bakr
Ibrahim Zidan
Khaled Shaalan
An Arabic CCG approach for determining constituent types from Arabic Treebank
Journal of King Saud University: Computer and Information Sciences
Arabic
CCGbank
Treebank
author_facet Ahmed I. El-taher
Hitahm M. Abo Bakr
Ibrahim Zidan
Khaled Shaalan
author_sort Ahmed I. El-taher
title An Arabic CCG approach for determining constituent types from Arabic Treebank
title_short An Arabic CCG approach for determining constituent types from Arabic Treebank
title_full An Arabic CCG approach for determining constituent types from Arabic Treebank
title_fullStr An Arabic CCG approach for determining constituent types from Arabic Treebank
title_full_unstemmed An Arabic CCG approach for determining constituent types from Arabic Treebank
title_sort arabic ccg approach for determining constituent types from arabic treebank
publisher Elsevier
series Journal of King Saud University: Computer and Information Sciences
issn 1319-1578
publishDate 2014-12-01
description Converting a treebank into a CCGbank opens the respective language to the sophisticated tools developed for Combinatory Categorial Grammar (CCG) and enriches cross-linguistic development. The conversion is primarily a three-step process: determining constituents’ types, binarization, and category conversion. Usually, this process involves a preprocessing step to the Treebank of choice for correcting brackets and normalizing tags for any changes that were introduced during the manual annotation, as well as extracting morpho-syntactic information that is necessary for determining constituents’ types. In this article, we describe the required preprocessing step on the Arabic Treebank, as well as how to determine Arabic constituents’ types. We conducted an experiment on parts 1 and 2 of the Penn Arabic Treebank (PATB) aimed at converting the PATB into an Arabic CCGbank. The performance of our algorithm when applied to ATB1v2.0 & ATB2v2.0 was 99% identification of head nodes and 100% coverage over the Treebank data.
topic Arabic
CCGbank
Treebank
url http://www.sciencedirect.com/science/article/pii/S1319157814000299
work_keys_str_mv AT ahmedieltaher anarabicccgapproachfordeterminingconstituenttypesfromarabictreebank
AT hitahmmabobakr anarabicccgapproachfordeterminingconstituenttypesfromarabictreebank
AT ibrahimzidan anarabicccgapproachfordeterminingconstituenttypesfromarabictreebank
AT khaledshaalan anarabicccgapproachfordeterminingconstituenttypesfromarabictreebank
AT ahmedieltaher arabicccgapproachfordeterminingconstituenttypesfromarabictreebank
AT hitahmmabobakr arabicccgapproachfordeterminingconstituenttypesfromarabictreebank
AT ibrahimzidan arabicccgapproachfordeterminingconstituenttypesfromarabictreebank
AT khaledshaalan arabicccgapproachfordeterminingconstituenttypesfromarabictreebank
_version_ 1725917954670854144