Multilingual Machine Reading Comprehension based on BERT Model

碩士 === 國立臺北科技大學 === 資訊工程系 === 107 === In recent years, Internet had got more and more information so that people can without it every day. Dual to Internet was subject to Information Retrieval technique limit. Nevertheless, it can provide many different types of information resource. But to user, th...

Full description

Bibliographic Details
Main Authors: WU, CHENG-XUAN, 吳承軒
Other Authors: WANG, JENQ-HAUR
Format: Others
Language:zh-TW
Published: 2019
Online Access:http://ndltd.ncl.edu.tw/handle/aua9d5
id ndltd-TW-107TIT00392066
record_format oai_dc
spelling ndltd-TW-107TIT003920662019-11-09T05:23:36Z http://ndltd.ncl.edu.tw/handle/aua9d5 Multilingual Machine Reading Comprehension based on BERT Model 基於BERT模型之多國語言機器閱讀理解研究 WU, CHENG-XUAN 吳承軒 碩士 國立臺北科技大學 資訊工程系 107 In recent years, Internet had got more and more information so that people can without it every day. Dual to Internet was subject to Information Retrieval technique limit. Nevertheless, it can provide many different types of information resource. But to user, those resource may not so related and helpful. With the development of neural network. Different research fields have made progress. Specifically, two research topics Question Answering and Machine Comprehension among the Natural Language Processing field. Become more and more popular research issue due to importance of Information Retrieval and Chatbot in the past few years. In this thesis, we use Google BERT pre-trained model to processing Word-Embedding, through use mass amount of data to learning and masking 15% token on training stage so that Embedded result have better performance on unknown situation and learn more semantics. Our model forms a semantic sentence feature which use single word and words via Word Embedding. Then using Cosine Similarity to calculate similarity between sentence and option. Finally, choose the option of highest cosine similarity score as machine inference answer. Our thesis does experiment on TOEFL-QA dataset and Grand challenge dataset that compare to Bi-directional Gated Recurrent Unit method and A Strong Alignment IR Baseline getting 34.87% accuracy and 57.5% accuracy. As a result, our model has multilingual propriety to some extent, even if grammar difference exists in difference language. WANG, JENQ-HAUR 王正豪 2019 學位論文 ; thesis 43 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立臺北科技大學 === 資訊工程系 === 107 === In recent years, Internet had got more and more information so that people can without it every day. Dual to Internet was subject to Information Retrieval technique limit. Nevertheless, it can provide many different types of information resource. But to user, those resource may not so related and helpful. With the development of neural network. Different research fields have made progress. Specifically, two research topics Question Answering and Machine Comprehension among the Natural Language Processing field. Become more and more popular research issue due to importance of Information Retrieval and Chatbot in the past few years. In this thesis, we use Google BERT pre-trained model to processing Word-Embedding, through use mass amount of data to learning and masking 15% token on training stage so that Embedded result have better performance on unknown situation and learn more semantics. Our model forms a semantic sentence feature which use single word and words via Word Embedding. Then using Cosine Similarity to calculate similarity between sentence and option. Finally, choose the option of highest cosine similarity score as machine inference answer. Our thesis does experiment on TOEFL-QA dataset and Grand challenge dataset that compare to Bi-directional Gated Recurrent Unit method and A Strong Alignment IR Baseline getting 34.87% accuracy and 57.5% accuracy. As a result, our model has multilingual propriety to some extent, even if grammar difference exists in difference language.
author2 WANG, JENQ-HAUR
author_facet WANG, JENQ-HAUR
WU, CHENG-XUAN
吳承軒
author WU, CHENG-XUAN
吳承軒
spellingShingle WU, CHENG-XUAN
吳承軒
Multilingual Machine Reading Comprehension based on BERT Model
author_sort WU, CHENG-XUAN
title Multilingual Machine Reading Comprehension based on BERT Model
title_short Multilingual Machine Reading Comprehension based on BERT Model
title_full Multilingual Machine Reading Comprehension based on BERT Model
title_fullStr Multilingual Machine Reading Comprehension based on BERT Model
title_full_unstemmed Multilingual Machine Reading Comprehension based on BERT Model
title_sort multilingual machine reading comprehension based on bert model
publishDate 2019
url http://ndltd.ncl.edu.tw/handle/aua9d5
work_keys_str_mv AT wuchengxuan multilingualmachinereadingcomprehensionbasedonbertmodel
AT wúchéngxuān multilingualmachinereadingcomprehensionbasedonbertmodel
AT wuchengxuan jīyúbertmóxíngzhīduōguóyǔyánjīqìyuèdúlǐjiěyánjiū
AT wúchéngxuān jīyúbertmóxíngzhīduōguóyǔyánjīqìyuèdúlǐjiěyánjiū
_version_ 1719288959618514944