An Arabic CCG approach for determining constituent types from Arabic Treebank
Converting a treebank into a CCGbank opens the respective language to the sophisticated tools developed for Combinatory Categorial Grammar (CCG) and enriches cross-linguistic development. The conversion is primarily a three-step process: determining constituents’ types, binarization, and category co...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2014-12-01
|
Series: | Journal of King Saud University: Computer and Information Sciences |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S1319157814000299 |
id |
doaj-5bcff5998ade41f7bf6bd4180f7523b7 |
---|---|
record_format |
Article |
spelling |
doaj-5bcff5998ade41f7bf6bd4180f7523b72020-11-24T21:42:16ZengElsevierJournal of King Saud University: Computer and Information Sciences1319-15782014-12-0126444144910.1016/j.jksuci.2014.06.005An Arabic CCG approach for determining constituent types from Arabic TreebankAhmed I. El-taher0Hitahm M. Abo Bakr1Ibrahim Zidan2Khaled Shaalan3Derpartment of Computer and System Engineering, Faculty of Engineering, Zagazig University, Zagazig, Asharkia, EgyptDerpartment of Computer and System Engineering, Faculty of Engineering, Zagazig University, Zagazig, Asharkia, EgyptDerpartment of Computer and System Engineering, Faculty of Engineering, Zagazig University, Zagazig, Asharkia, EgyptThe British University, Dubai, United Arab EmiratesConverting a treebank into a CCGbank opens the respective language to the sophisticated tools developed for Combinatory Categorial Grammar (CCG) and enriches cross-linguistic development. The conversion is primarily a three-step process: determining constituents’ types, binarization, and category conversion. Usually, this process involves a preprocessing step to the Treebank of choice for correcting brackets and normalizing tags for any changes that were introduced during the manual annotation, as well as extracting morpho-syntactic information that is necessary for determining constituents’ types. In this article, we describe the required preprocessing step on the Arabic Treebank, as well as how to determine Arabic constituents’ types. We conducted an experiment on parts 1 and 2 of the Penn Arabic Treebank (PATB) aimed at converting the PATB into an Arabic CCGbank. The performance of our algorithm when applied to ATB1v2.0 & ATB2v2.0 was 99% identification of head nodes and 100% coverage over the Treebank data.http://www.sciencedirect.com/science/article/pii/S1319157814000299ArabicCCGbankTreebank |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Ahmed I. El-taher Hitahm M. Abo Bakr Ibrahim Zidan Khaled Shaalan |
spellingShingle |
Ahmed I. El-taher Hitahm M. Abo Bakr Ibrahim Zidan Khaled Shaalan An Arabic CCG approach for determining constituent types from Arabic Treebank Journal of King Saud University: Computer and Information Sciences Arabic CCGbank Treebank |
author_facet |
Ahmed I. El-taher Hitahm M. Abo Bakr Ibrahim Zidan Khaled Shaalan |
author_sort |
Ahmed I. El-taher |
title |
An Arabic CCG approach for determining constituent types from Arabic Treebank |
title_short |
An Arabic CCG approach for determining constituent types from Arabic Treebank |
title_full |
An Arabic CCG approach for determining constituent types from Arabic Treebank |
title_fullStr |
An Arabic CCG approach for determining constituent types from Arabic Treebank |
title_full_unstemmed |
An Arabic CCG approach for determining constituent types from Arabic Treebank |
title_sort |
arabic ccg approach for determining constituent types from arabic treebank |
publisher |
Elsevier |
series |
Journal of King Saud University: Computer and Information Sciences |
issn |
1319-1578 |
publishDate |
2014-12-01 |
description |
Converting a treebank into a CCGbank opens the respective language to the sophisticated tools developed for Combinatory Categorial Grammar (CCG) and enriches cross-linguistic development. The conversion is primarily a three-step process: determining constituents’ types, binarization, and category conversion. Usually, this process involves a preprocessing step to the Treebank of choice for correcting brackets and normalizing tags for any changes that were introduced during the manual annotation, as well as extracting morpho-syntactic information that is necessary for determining constituents’ types. In this article, we describe the required preprocessing step on the Arabic Treebank, as well as how to determine Arabic constituents’ types. We conducted an experiment on parts 1 and 2 of the Penn Arabic Treebank (PATB) aimed at converting the PATB into an Arabic CCGbank. The performance of our algorithm when applied to ATB1v2.0 & ATB2v2.0 was 99% identification of head nodes and 100% coverage over the Treebank data. |
topic |
Arabic CCGbank Treebank |
url |
http://www.sciencedirect.com/science/article/pii/S1319157814000299 |
work_keys_str_mv |
AT ahmedieltaher anarabicccgapproachfordeterminingconstituenttypesfromarabictreebank AT hitahmmabobakr anarabicccgapproachfordeterminingconstituenttypesfromarabictreebank AT ibrahimzidan anarabicccgapproachfordeterminingconstituenttypesfromarabictreebank AT khaledshaalan anarabicccgapproachfordeterminingconstituenttypesfromarabictreebank AT ahmedieltaher arabicccgapproachfordeterminingconstituenttypesfromarabictreebank AT hitahmmabobakr arabicccgapproachfordeterminingconstituenttypesfromarabictreebank AT ibrahimzidan arabicccgapproachfordeterminingconstituenttypesfromarabictreebank AT khaledshaalan arabicccgapproachfordeterminingconstituenttypesfromarabictreebank |
_version_ |
1725917954670854144 |