Multilingual Machine Reading Comprehension based on BERT Model
碩士 === 國立臺北科技大學 === 資訊工程系 === 107 === In recent years, Internet had got more and more information so that people can without it every day. Dual to Internet was subject to Information Retrieval technique limit. Nevertheless, it can provide many different types of information resource. But to user, th...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2019
|
Online Access: | http://ndltd.ncl.edu.tw/handle/aua9d5 |
id |
ndltd-TW-107TIT00392066 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-107TIT003920662019-11-09T05:23:36Z http://ndltd.ncl.edu.tw/handle/aua9d5 Multilingual Machine Reading Comprehension based on BERT Model 基於BERT模型之多國語言機器閱讀理解研究 WU, CHENG-XUAN 吳承軒 碩士 國立臺北科技大學 資訊工程系 107 In recent years, Internet had got more and more information so that people can without it every day. Dual to Internet was subject to Information Retrieval technique limit. Nevertheless, it can provide many different types of information resource. But to user, those resource may not so related and helpful. With the development of neural network. Different research fields have made progress. Specifically, two research topics Question Answering and Machine Comprehension among the Natural Language Processing field. Become more and more popular research issue due to importance of Information Retrieval and Chatbot in the past few years. In this thesis, we use Google BERT pre-trained model to processing Word-Embedding, through use mass amount of data to learning and masking 15% token on training stage so that Embedded result have better performance on unknown situation and learn more semantics. Our model forms a semantic sentence feature which use single word and words via Word Embedding. Then using Cosine Similarity to calculate similarity between sentence and option. Finally, choose the option of highest cosine similarity score as machine inference answer. Our thesis does experiment on TOEFL-QA dataset and Grand challenge dataset that compare to Bi-directional Gated Recurrent Unit method and A Strong Alignment IR Baseline getting 34.87% accuracy and 57.5% accuracy. As a result, our model has multilingual propriety to some extent, even if grammar difference exists in difference language. WANG, JENQ-HAUR 王正豪 2019 學位論文 ; thesis 43 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺北科技大學 === 資訊工程系 === 107 === In recent years, Internet had got more and more information so that people can without it every day. Dual to Internet was subject to Information Retrieval technique limit. Nevertheless, it can provide many different types of information resource. But to user, those resource may not so related and helpful. With the development of neural network. Different research fields have made progress. Specifically, two research topics Question Answering and Machine Comprehension among the Natural Language Processing field. Become more and more popular research issue due to importance of Information Retrieval and Chatbot in the past few years. In this thesis, we use Google BERT pre-trained model to processing Word-Embedding, through use mass amount of data to learning and masking 15% token on training stage so that Embedded result have better performance on unknown situation and learn more semantics. Our model forms a semantic sentence feature which use single word and words via Word Embedding. Then using Cosine Similarity to calculate similarity between sentence and option. Finally, choose the option of highest cosine similarity score as machine inference answer. Our thesis does experiment on TOEFL-QA dataset and Grand challenge dataset that compare to Bi-directional Gated Recurrent Unit method and A Strong Alignment IR Baseline getting 34.87% accuracy and 57.5% accuracy. As a result, our model has multilingual propriety to some extent, even if grammar difference exists in difference language.
|
author2 |
WANG, JENQ-HAUR |
author_facet |
WANG, JENQ-HAUR WU, CHENG-XUAN 吳承軒 |
author |
WU, CHENG-XUAN 吳承軒 |
spellingShingle |
WU, CHENG-XUAN 吳承軒 Multilingual Machine Reading Comprehension based on BERT Model |
author_sort |
WU, CHENG-XUAN |
title |
Multilingual Machine Reading Comprehension based on BERT Model |
title_short |
Multilingual Machine Reading Comprehension based on BERT Model |
title_full |
Multilingual Machine Reading Comprehension based on BERT Model |
title_fullStr |
Multilingual Machine Reading Comprehension based on BERT Model |
title_full_unstemmed |
Multilingual Machine Reading Comprehension based on BERT Model |
title_sort |
multilingual machine reading comprehension based on bert model |
publishDate |
2019 |
url |
http://ndltd.ncl.edu.tw/handle/aua9d5 |
work_keys_str_mv |
AT wuchengxuan multilingualmachinereadingcomprehensionbasedonbertmodel AT wúchéngxuān multilingualmachinereadingcomprehensionbasedonbertmodel AT wuchengxuan jīyúbertmóxíngzhīduōguóyǔyánjīqìyuèdúlǐjiěyánjiū AT wúchéngxuān jīyúbertmóxíngzhīduōguóyǔyánjīqìyuèdúlǐjiěyánjiū |
_version_ |
1719288959618514944 |