Designing and Implementing a Cross-Language Information Retrieval System Using Linguistic Corpora
Information retrieval (IR) is a crucial area of natural language processing (NLP) and can be defined as finding documents whose content is relevant to the query need of a user. Cross-language information retrieval (CLIR) refers to a kind of information retrieval in which the language of the query an...
Main Authors: | , |
---|---|
Format: | Article |
Language: | fas |
Published: |
Iranian Research Institute for Information and Technology
2012-03-01
|
Series: | Iranian Journal of Information Processing & Management |
Subjects: | |
Online Access: | http://jipm.irandoc.ac.ir/browse.php?a_code=A-10-1118-44&slc_lang=en&sid=1 |
id |
doaj-3a1658fcd5ec41c1a3ff0dfe1657e3bd |
---|---|
record_format |
Article |
spelling |
doaj-3a1658fcd5ec41c1a3ff0dfe1657e3bd2020-11-24T22:42:41ZfasIranian Research Institute for Information and TechnologyIranian Journal of Information Processing & Management2251-82232251-82312012-03-01272798813Designing and Implementing a Cross-Language Information Retrieval System Using Linguistic CorporaAmin Nezarat0Tayebeh Mosavi Miangah1 Islamic Azad University, Yazd Branch Applied Linguistics, Payame Noor University, Yazd Information retrieval (IR) is a crucial area of natural language processing (NLP) and can be defined as finding documents whose content is relevant to the query need of a user. Cross-language information retrieval (CLIR) refers to a kind of information retrieval in which the language of the query and that of searched document are different. In fact, it is a retrieval process where the user presents queries in one language to retrieve documents in another language. This paper tried to construct a bilingual lexicon of parallel chunks of English and Persian from two very large monolingual corpora an English-Persian parallel corpus which could be directly applied to cross-language information retrieval tasks. For this purpose, a statistical measure known as Association Score (AS) was used to compute the association value between every two corresponding chunks in the corpus using a couple of complicated algorithms. Once the CLIR system was developed using this bilingual lexicon, an experiment was performed on a set of one hundred English and Persian phrases and collocations to see to what extend this system was effective in assisting the users find the most relevant and suitable equivalents of their queries in either language.http://jipm.irandoc.ac.ir/browse.php?a_code=A-10-1118-44&slc_lang=en&sid=1Cross-language information retrieval linguistic corpora automated translation intelligent factors |
collection |
DOAJ |
language |
fas |
format |
Article |
sources |
DOAJ |
author |
Amin Nezarat Tayebeh Mosavi Miangah |
spellingShingle |
Amin Nezarat Tayebeh Mosavi Miangah Designing and Implementing a Cross-Language Information Retrieval System Using Linguistic Corpora Iranian Journal of Information Processing & Management Cross-language information retrieval linguistic corpora automated translation intelligent factors |
author_facet |
Amin Nezarat Tayebeh Mosavi Miangah |
author_sort |
Amin Nezarat |
title |
Designing and Implementing a Cross-Language Information Retrieval System Using Linguistic Corpora |
title_short |
Designing and Implementing a Cross-Language Information Retrieval System Using Linguistic Corpora |
title_full |
Designing and Implementing a Cross-Language Information Retrieval System Using Linguistic Corpora |
title_fullStr |
Designing and Implementing a Cross-Language Information Retrieval System Using Linguistic Corpora |
title_full_unstemmed |
Designing and Implementing a Cross-Language Information Retrieval System Using Linguistic Corpora |
title_sort |
designing and implementing a cross-language information retrieval system using linguistic corpora |
publisher |
Iranian Research Institute for Information and Technology |
series |
Iranian Journal of Information Processing & Management |
issn |
2251-8223 2251-8231 |
publishDate |
2012-03-01 |
description |
Information retrieval (IR) is a crucial area of natural language processing (NLP) and can be defined as finding documents whose content is relevant to the query need of a user. Cross-language information retrieval (CLIR) refers to a kind of information retrieval in which the language of the query and that of searched document are different. In fact, it is a retrieval process where the user presents queries in one language to retrieve documents in another language. This paper tried to construct a bilingual lexicon of parallel chunks of English and Persian from two very large monolingual corpora an English-Persian parallel corpus which could be directly applied to cross-language information retrieval tasks. For this purpose, a statistical measure known as Association Score (AS) was used to compute the association value between every two corresponding chunks in the corpus using a couple of complicated algorithms. Once the CLIR system was developed using this bilingual lexicon, an experiment was performed on a set of one hundred English and Persian phrases and collocations to see to what extend this system was effective in assisting the users find the most relevant and suitable equivalents of their queries in either language. |
topic |
Cross-language information retrieval linguistic corpora automated translation intelligent factors |
url |
http://jipm.irandoc.ac.ir/browse.php?a_code=A-10-1118-44&slc_lang=en&sid=1 |
work_keys_str_mv |
AT aminnezarat designingandimplementingacrosslanguageinformationretrievalsystemusinglinguisticcorpora AT tayebehmosavimiangah designingandimplementingacrosslanguageinformationretrievalsystemusinglinguisticcorpora |
_version_ |
1725698906306641920 |