A Neural Approach to Cross-Lingual Information Retrieval

With the rapid growth of world-wide information accessibility, cross-language information retrieval (CLIR) has become a prominent concern for search engines. Traditional CLIR technologies require special purpose components and need high quality translation knowledge (e.g. machine readable dictionari...

Full description

Bibliographic Details
Main Author:	Liu, Qing
Format:	Others
Published:	Research Showcase @ CMU 2018
Online Access:	http://repository.cmu.edu/theses/135 http://repository.cmu.edu/cgi/viewcontent.cgi?article=1141&context=theses

id	ndltd-cmu.edu-oai-repository.cmu.edu-theses-1141
record_format	oai_dc
spelling	ndltd-cmu.edu-oai-repository.cmu.edu-theses-11412018-05-17T03:28:37Z A Neural Approach to Cross-Lingual Information Retrieval Liu, Qing With the rapid growth of world-wide information accessibility, cross-language information retrieval (CLIR) has become a prominent concern for search engines. Traditional CLIR technologies require special purpose components and need high quality translation knowledge (e.g. machine readable dictionaries, machine translation systems) and careful tuning to achieve high ranking performance. However, with the help of a neural network architecture, it’s possible to solve CLIR problem without extra tuning or special components. This work proposes a bilingual training approach, a neural CLIR solution allowing automatic learning of translation relationships from noisy translation knowledge. External sources of translation knowledge are used to generate bilingual training data then the bilingual training data is fed into a kernel based neural ranking model. During the end-to-end training, word embeddings are tuned to preserve translation relationships between bilingual word pairs and also tailored for the ranking task. In experiments we show that the bilingual training approach outperforms traditional CLIR techniques given the same external translation knowledge source and it’s able to yield ranking results as good as that of a monolingual information retrieval system. In experiments we investigate the source of effectiveness for our neural CLIR approach by analyzing the pattern of trained word embeddings. Also, possible methods to further improve performance are explored in experiments, including cleaning training data by removing ambiguous training queries, exploring whether more training data will improve the performance by learning the relationship between training dataset size and model performance, and investigating the affect of English queries’ text-transform in training data. Lastly, we design an experiment that analyzes the quality of testing query translation to quantify the model performance in a real testing scenario where model takes manually written English queries as input. 2018-05-01T07:00:00Z text application/pdf http://repository.cmu.edu/theses/135 http://repository.cmu.edu/cgi/viewcontent.cgi?article=1141&context=theses http://creativecommons.org/licenses/by-nc/4.0/ Theses Research Showcase @ CMU
collection	NDLTD
format	Others
sources	NDLTD
description	With the rapid growth of world-wide information accessibility, cross-language information retrieval (CLIR) has become a prominent concern for search engines. Traditional CLIR technologies require special purpose components and need high quality translation knowledge (e.g. machine readable dictionaries, machine translation systems) and careful tuning to achieve high ranking performance. However, with the help of a neural network architecture, it’s possible to solve CLIR problem without extra tuning or special components. This work proposes a bilingual training approach, a neural CLIR solution allowing automatic learning of translation relationships from noisy translation knowledge. External sources of translation knowledge are used to generate bilingual training data then the bilingual training data is fed into a kernel based neural ranking model. During the end-to-end training, word embeddings are tuned to preserve translation relationships between bilingual word pairs and also tailored for the ranking task. In experiments we show that the bilingual training approach outperforms traditional CLIR techniques given the same external translation knowledge source and it’s able to yield ranking results as good as that of a monolingual information retrieval system. In experiments we investigate the source of effectiveness for our neural CLIR approach by analyzing the pattern of trained word embeddings. Also, possible methods to further improve performance are explored in experiments, including cleaning training data by removing ambiguous training queries, exploring whether more training data will improve the performance by learning the relationship between training dataset size and model performance, and investigating the affect of English queries’ text-transform in training data. Lastly, we design an experiment that analyzes the quality of testing query translation to quantify the model performance in a real testing scenario where model takes manually written English queries as input.
author	Liu, Qing
spellingShingle	Liu, Qing A Neural Approach to Cross-Lingual Information Retrieval
author_facet	Liu, Qing
author_sort	Liu, Qing
title	A Neural Approach to Cross-Lingual Information Retrieval
title_short	A Neural Approach to Cross-Lingual Information Retrieval
title_full	A Neural Approach to Cross-Lingual Information Retrieval
title_fullStr	A Neural Approach to Cross-Lingual Information Retrieval
title_full_unstemmed	A Neural Approach to Cross-Lingual Information Retrieval
title_sort	neural approach to cross-lingual information retrieval
publisher	Research Showcase @ CMU
publishDate	2018
url	http://repository.cmu.edu/theses/135 http://repository.cmu.edu/cgi/viewcontent.cgi?article=1141&context=theses
work_keys_str_mv	AT liuqing aneuralapproachtocrosslingualinformationretrieval AT liuqing neuralapproachtocrosslingualinformationretrieval
_version_	1718640129275330560

A Neural Approach to Cross-Lingual Information Retrieval

Similar Items