A Neural Approach to Cross-Lingual Information Retrieval

With the rapid growth of world-wide information accessibility, cross-language information retrieval (CLIR) has become a prominent concern for search engines. Traditional CLIR technologies require special purpose components and need high quality translation knowledge (e.g. machine readable dictionari...

Full description

Bibliographic Details
Main Author: Liu, Qing
Format: Others
Published: Research Showcase @ CMU 2018
Online Access:http://repository.cmu.edu/theses/135
http://repository.cmu.edu/cgi/viewcontent.cgi?article=1141&context=theses
id ndltd-cmu.edu-oai-repository.cmu.edu-theses-1141
record_format oai_dc
spelling ndltd-cmu.edu-oai-repository.cmu.edu-theses-11412018-05-17T03:28:37Z A Neural Approach to Cross-Lingual Information Retrieval Liu, Qing With the rapid growth of world-wide information accessibility, cross-language information retrieval (CLIR) has become a prominent concern for search engines. Traditional CLIR technologies require special purpose components and need high quality translation knowledge (e.g. machine readable dictionaries, machine translation systems) and careful tuning to achieve high ranking performance. However, with the help of a neural network architecture, it’s possible to solve CLIR problem without extra tuning or special components. This work proposes a bilingual training approach, a neural CLIR solution allowing automatic learning of translation relationships from noisy translation knowledge. External sources of translation knowledge are used to generate bilingual training data then the bilingual training data is fed into a kernel based neural ranking model. During the end-to-end training, word embeddings are tuned to preserve translation relationships between bilingual word pairs and also tailored for the ranking task. In experiments we show that the bilingual training approach outperforms traditional CLIR techniques given the same external translation knowledge source and it’s able to yield ranking results as good as that of a monolingual information retrieval system. In experiments we investigate the source of effectiveness for our neural CLIR approach by analyzing the pattern of trained word embeddings. Also, possible methods to further improve performance are explored in experiments, including cleaning training data by removing ambiguous training queries, exploring whether more training data will improve the performance by learning the relationship between training dataset size and model performance, and investigating the affect of English queries’ text-transform in training data. Lastly, we design an experiment that analyzes the quality of testing query translation to quantify the model performance in a real testing scenario where model takes manually written English queries as input. 2018-05-01T07:00:00Z text application/pdf http://repository.cmu.edu/theses/135 http://repository.cmu.edu/cgi/viewcontent.cgi?article=1141&context=theses http://creativecommons.org/licenses/by-nc/4.0/ Theses Research Showcase @ CMU
collection NDLTD
format Others
sources NDLTD
description With the rapid growth of world-wide information accessibility, cross-language information retrieval (CLIR) has become a prominent concern for search engines. Traditional CLIR technologies require special purpose components and need high quality translation knowledge (e.g. machine readable dictionaries, machine translation systems) and careful tuning to achieve high ranking performance. However, with the help of a neural network architecture, it’s possible to solve CLIR problem without extra tuning or special components. This work proposes a bilingual training approach, a neural CLIR solution allowing automatic learning of translation relationships from noisy translation knowledge. External sources of translation knowledge are used to generate bilingual training data then the bilingual training data is fed into a kernel based neural ranking model. During the end-to-end training, word embeddings are tuned to preserve translation relationships between bilingual word pairs and also tailored for the ranking task. In experiments we show that the bilingual training approach outperforms traditional CLIR techniques given the same external translation knowledge source and it’s able to yield ranking results as good as that of a monolingual information retrieval system. In experiments we investigate the source of effectiveness for our neural CLIR approach by analyzing the pattern of trained word embeddings. Also, possible methods to further improve performance are explored in experiments, including cleaning training data by removing ambiguous training queries, exploring whether more training data will improve the performance by learning the relationship between training dataset size and model performance, and investigating the affect of English queries’ text-transform in training data. Lastly, we design an experiment that analyzes the quality of testing query translation to quantify the model performance in a real testing scenario where model takes manually written English queries as input.
author Liu, Qing
spellingShingle Liu, Qing
A Neural Approach to Cross-Lingual Information Retrieval
author_facet Liu, Qing
author_sort Liu, Qing
title A Neural Approach to Cross-Lingual Information Retrieval
title_short A Neural Approach to Cross-Lingual Information Retrieval
title_full A Neural Approach to Cross-Lingual Information Retrieval
title_fullStr A Neural Approach to Cross-Lingual Information Retrieval
title_full_unstemmed A Neural Approach to Cross-Lingual Information Retrieval
title_sort neural approach to cross-lingual information retrieval
publisher Research Showcase @ CMU
publishDate 2018
url http://repository.cmu.edu/theses/135
http://repository.cmu.edu/cgi/viewcontent.cgi?article=1141&context=theses
work_keys_str_mv AT liuqing aneuralapproachtocrosslingualinformationretrieval
AT liuqing neuralapproachtocrosslingualinformationretrieval
_version_ 1718640129275330560