Effective Learning to Rank Persian Web Content

Persian language is one of the most widely used languages in the Web environment. Hence, the Persian Web includes invaluable information that is required to be retrieved effectively. Similar to other languages, ranking algorithms for the Persian Web content, deal with different challenges, such as a...

Full description

Bibliographic Details
Main Author: Amir Hosein Keyhanipour
Format: Article
Language:fas
Published: University of Tehran 2019-06-01
Series:Journal of Information Technology Management
Subjects:
Online Access:https://jitm.ut.ac.ir/article_73950_988e64b12b71a687946564e3445b8d72.pdf
id doaj-2af650a3667649359658f88d49193d2c
record_format Article
spelling doaj-2af650a3667649359658f88d49193d2c2020-11-25T02:44:09ZfasUniversity of TehranJournal of Information Technology Management 2008-58932423-50592019-06-0111211112810.22059/jitm.2019.284726.237773950Effective Learning to Rank Persian Web ContentAmir Hosein Keyhanipour0Assistant Professor, Computer Engineering Department, Faculty of Engineering, College of Farabi, University of Tehran, Iran.Persian language is one of the most widely used languages in the Web environment. Hence, the Persian Web includes invaluable information that is required to be retrieved effectively. Similar to other languages, ranking algorithms for the Persian Web content, deal with different challenges, such as applicability issues in real-world situations as well as the lack of user modeling. CF-Rank, as a recently proposed learning to rank data, aims to deal with such issues by the classifier fusion idea. CF-Rank generates a few click-through features, which provide a compact representation of a given primitive dataset. By constructing the primitive classifiers on each category of click-through features and aggregating their decisions by the use of information fusion techniques, CF-Rank has become a successful ranking algorithm in English datasets. In this paper, CF-Rank is customized for the Persian Web content. Evaluation results of this algorithm on the dotIR dataset indicate that the customized CF-Rank outperforms baseline rankings. Especially, the improvement is more noticeable at the top of ranked lists, which are observed most of the time by the Web users. According to the NDCG@1 and MAP evaluation criteria, comparing the CF-Rank with the preeminent baseline algorithm on the dotIR dataset indicates an improvement of 30 percent and 16.5 percent, respectively.https://jitm.ut.ac.ir/article_73950_988e64b12b71a687946564e3445b8d72.pdflearning to rankpersian languagecf-rank algorithmdotir datasetinformation fusion
collection DOAJ
language fas
format Article
sources DOAJ
author Amir Hosein Keyhanipour
spellingShingle Amir Hosein Keyhanipour
Effective Learning to Rank Persian Web Content
Journal of Information Technology Management
learning to rank
persian language
cf-rank algorithm
dotir dataset
information fusion
author_facet Amir Hosein Keyhanipour
author_sort Amir Hosein Keyhanipour
title Effective Learning to Rank Persian Web Content
title_short Effective Learning to Rank Persian Web Content
title_full Effective Learning to Rank Persian Web Content
title_fullStr Effective Learning to Rank Persian Web Content
title_full_unstemmed Effective Learning to Rank Persian Web Content
title_sort effective learning to rank persian web content
publisher University of Tehran
series Journal of Information Technology Management
issn 2008-5893
2423-5059
publishDate 2019-06-01
description Persian language is one of the most widely used languages in the Web environment. Hence, the Persian Web includes invaluable information that is required to be retrieved effectively. Similar to other languages, ranking algorithms for the Persian Web content, deal with different challenges, such as applicability issues in real-world situations as well as the lack of user modeling. CF-Rank, as a recently proposed learning to rank data, aims to deal with such issues by the classifier fusion idea. CF-Rank generates a few click-through features, which provide a compact representation of a given primitive dataset. By constructing the primitive classifiers on each category of click-through features and aggregating their decisions by the use of information fusion techniques, CF-Rank has become a successful ranking algorithm in English datasets. In this paper, CF-Rank is customized for the Persian Web content. Evaluation results of this algorithm on the dotIR dataset indicate that the customized CF-Rank outperforms baseline rankings. Especially, the improvement is more noticeable at the top of ranked lists, which are observed most of the time by the Web users. According to the NDCG@1 and MAP evaluation criteria, comparing the CF-Rank with the preeminent baseline algorithm on the dotIR dataset indicates an improvement of 30 percent and 16.5 percent, respectively.
topic learning to rank
persian language
cf-rank algorithm
dotir dataset
information fusion
url https://jitm.ut.ac.ir/article_73950_988e64b12b71a687946564e3445b8d72.pdf
work_keys_str_mv AT amirhoseinkeyhanipour effectivelearningtorankpersianwebcontent
_version_ 1724767127681892352