Designing and Implementing a Cross-Language Information Retrieval System Using Linguistic Corpora

Information retrieval (IR) is a crucial area of natural language processing (NLP) and can be defined as finding documents whose content is relevant to the query need of a user. Cross-language information retrieval (CLIR) refers to a kind of information retrieval in which the language of the query an...

Full description

Bibliographic Details
Main Authors:	Amin Nezarat, Tayebeh Mosavi Miangah
Format:	Article
Language:	fas
Published:	Iranian Research Institute for Information and Technology 2012-03-01
Series:	Iranian Journal of Information Processing & Management
Subjects:	Cross-language information retrieval linguistic corpora automated translation intelligent factors
Online Access:	http://jipm.irandoc.ac.ir/browse.php?a_code=A-10-1118-44&slc_lang=en&sid=1

id	doaj-3a1658fcd5ec41c1a3ff0dfe1657e3bd
record_format	Article
spelling	doaj-3a1658fcd5ec41c1a3ff0dfe1657e3bd2020-11-24T22:42:41ZfasIranian Research Institute for Information and TechnologyIranian Journal of Information Processing & Management2251-82232251-82312012-03-01272798813Designing and Implementing a Cross-Language Information Retrieval System Using Linguistic CorporaAmin Nezarat0Tayebeh Mosavi Miangah1 Islamic Azad University, Yazd Branch Applied Linguistics, Payame Noor University, Yazd Information retrieval (IR) is a crucial area of natural language processing (NLP) and can be defined as finding documents whose content is relevant to the query need of a user. Cross-language information retrieval (CLIR) refers to a kind of information retrieval in which the language of the query and that of searched document are different. In fact, it is a retrieval process where the user presents queries in one language to retrieve documents in another language. This paper tried to construct a bilingual lexicon of parallel chunks of English and Persian from two very large monolingual corpora an English-Persian parallel corpus which could be directly applied to cross-language information retrieval tasks. For this purpose, a statistical measure known as Association Score (AS) was used to compute the association value between every two corresponding chunks in the corpus using a couple of complicated algorithms. Once the CLIR system was developed using this bilingual lexicon, an experiment was performed on a set of one hundred English and Persian phrases and collocations to see to what extend this system was effective in assisting the users find the most relevant and suitable equivalents of their queries in either language.http://jipm.irandoc.ac.ir/browse.php?a_code=A-10-1118-44&slc_lang=en&sid=1Cross-language information retrieval linguistic corpora automated translation intelligent factors
collection	DOAJ
language	fas
format	Article
sources	DOAJ
author	Amin Nezarat Tayebeh Mosavi Miangah
spellingShingle	Amin Nezarat Tayebeh Mosavi Miangah Designing and Implementing a Cross-Language Information Retrieval System Using Linguistic Corpora Iranian Journal of Information Processing & Management Cross-language information retrieval linguistic corpora automated translation intelligent factors
author_facet	Amin Nezarat Tayebeh Mosavi Miangah
author_sort	Amin Nezarat
title	Designing and Implementing a Cross-Language Information Retrieval System Using Linguistic Corpora
title_short	Designing and Implementing a Cross-Language Information Retrieval System Using Linguistic Corpora
title_full	Designing and Implementing a Cross-Language Information Retrieval System Using Linguistic Corpora
title_fullStr	Designing and Implementing a Cross-Language Information Retrieval System Using Linguistic Corpora
title_full_unstemmed	Designing and Implementing a Cross-Language Information Retrieval System Using Linguistic Corpora
title_sort	designing and implementing a cross-language information retrieval system using linguistic corpora
publisher	Iranian Research Institute for Information and Technology
series	Iranian Journal of Information Processing & Management
issn	2251-8223 2251-8231
publishDate	2012-03-01
description	Information retrieval (IR) is a crucial area of natural language processing (NLP) and can be defined as finding documents whose content is relevant to the query need of a user. Cross-language information retrieval (CLIR) refers to a kind of information retrieval in which the language of the query and that of searched document are different. In fact, it is a retrieval process where the user presents queries in one language to retrieve documents in another language. This paper tried to construct a bilingual lexicon of parallel chunks of English and Persian from two very large monolingual corpora an English-Persian parallel corpus which could be directly applied to cross-language information retrieval tasks. For this purpose, a statistical measure known as Association Score (AS) was used to compute the association value between every two corresponding chunks in the corpus using a couple of complicated algorithms. Once the CLIR system was developed using this bilingual lexicon, an experiment was performed on a set of one hundred English and Persian phrases and collocations to see to what extend this system was effective in assisting the users find the most relevant and suitable equivalents of their queries in either language.
topic	Cross-language information retrieval linguistic corpora automated translation intelligent factors
url	http://jipm.irandoc.ac.ir/browse.php?a_code=A-10-1118-44&slc_lang=en&sid=1
work_keys_str_mv	AT aminnezarat designingandimplementingacrosslanguageinformationretrievalsystemusinglinguisticcorpora AT tayebehmosavimiangah designingandimplementingacrosslanguageinformationretrievalsystemusinglinguisticcorpora
_version_	1725698906306641920

Designing and Implementing a Cross-Language Information Retrieval System Using Linguistic Corpora

Similar Items