Designing and Implementing a Cross-Language Information Retrieval System Using Linguistic Corpora

Information retrieval (IR) is a crucial area of natural language processing (NLP) and can be defined as finding documents whose content is relevant to the query need of a user. Cross-language information retrieval (CLIR) refers to a kind of information retrieval in which the language of the query an...

Full description

Bibliographic Details
Main Authors: Amin Nezarat, Tayebeh Mosavi Miangah
Format: Article
Language:fas
Published: Iranian Research Institute for Information and Technology 2012-03-01
Series:Iranian Journal of Information Processing & Management
Subjects:
Online Access:http://jipm.irandoc.ac.ir/browse.php?a_code=A-10-1118-44&slc_lang=en&sid=1
id doaj-3a1658fcd5ec41c1a3ff0dfe1657e3bd
record_format Article
spelling doaj-3a1658fcd5ec41c1a3ff0dfe1657e3bd2020-11-24T22:42:41ZfasIranian Research Institute for Information and TechnologyIranian Journal of Information Processing & Management2251-82232251-82312012-03-01272798813Designing and Implementing a Cross-Language Information Retrieval System Using Linguistic CorporaAmin Nezarat0Tayebeh Mosavi Miangah1 Islamic Azad University, Yazd Branch Applied Linguistics, Payame Noor University, Yazd Information retrieval (IR) is a crucial area of natural language processing (NLP) and can be defined as finding documents whose content is relevant to the query need of a user. Cross-language information retrieval (CLIR) refers to a kind of information retrieval in which the language of the query and that of searched document are different. In fact, it is a retrieval process where the user presents queries in one language to retrieve documents in another language. This paper tried to construct a bilingual lexicon of parallel chunks of English and Persian from two very large monolingual corpora an English-Persian parallel corpus which could be directly applied to cross-language information retrieval tasks. For this purpose, a statistical measure known as Association Score (AS) was used to compute the association value between every two corresponding chunks in the corpus using a couple of complicated algorithms. Once the CLIR system was developed using this bilingual lexicon, an experiment was performed on a set of one hundred English and Persian phrases and collocations to see to what extend this system was effective in assisting the users find the most relevant and suitable equivalents of their queries in either language.http://jipm.irandoc.ac.ir/browse.php?a_code=A-10-1118-44&slc_lang=en&sid=1Cross-language information retrieval linguistic corpora automated translation intelligent factors
collection DOAJ
language fas
format Article
sources DOAJ
author Amin Nezarat
Tayebeh Mosavi Miangah
spellingShingle Amin Nezarat
Tayebeh Mosavi Miangah
Designing and Implementing a Cross-Language Information Retrieval System Using Linguistic Corpora
Iranian Journal of Information Processing & Management
Cross-language information retrieval
linguistic corpora
automated translation
intelligent factors
author_facet Amin Nezarat
Tayebeh Mosavi Miangah
author_sort Amin Nezarat
title Designing and Implementing a Cross-Language Information Retrieval System Using Linguistic Corpora
title_short Designing and Implementing a Cross-Language Information Retrieval System Using Linguistic Corpora
title_full Designing and Implementing a Cross-Language Information Retrieval System Using Linguistic Corpora
title_fullStr Designing and Implementing a Cross-Language Information Retrieval System Using Linguistic Corpora
title_full_unstemmed Designing and Implementing a Cross-Language Information Retrieval System Using Linguistic Corpora
title_sort designing and implementing a cross-language information retrieval system using linguistic corpora
publisher Iranian Research Institute for Information and Technology
series Iranian Journal of Information Processing & Management
issn 2251-8223
2251-8231
publishDate 2012-03-01
description Information retrieval (IR) is a crucial area of natural language processing (NLP) and can be defined as finding documents whose content is relevant to the query need of a user. Cross-language information retrieval (CLIR) refers to a kind of information retrieval in which the language of the query and that of searched document are different. In fact, it is a retrieval process where the user presents queries in one language to retrieve documents in another language. This paper tried to construct a bilingual lexicon of parallel chunks of English and Persian from two very large monolingual corpora an English-Persian parallel corpus which could be directly applied to cross-language information retrieval tasks. For this purpose, a statistical measure known as Association Score (AS) was used to compute the association value between every two corresponding chunks in the corpus using a couple of complicated algorithms. Once the CLIR system was developed using this bilingual lexicon, an experiment was performed on a set of one hundred English and Persian phrases and collocations to see to what extend this system was effective in assisting the users find the most relevant and suitable equivalents of their queries in either language.
topic Cross-language information retrieval
linguistic corpora
automated translation
intelligent factors
url http://jipm.irandoc.ac.ir/browse.php?a_code=A-10-1118-44&slc_lang=en&sid=1
work_keys_str_mv AT aminnezarat designingandimplementingacrosslanguageinformationretrievalsystemusinglinguisticcorpora
AT tayebehmosavimiangah designingandimplementingacrosslanguageinformationretrievalsystemusinglinguisticcorpora
_version_ 1725698906306641920