Development of the film discussion corpus based on retrieval generative architecture

碩士 === 崑山科技大學 === 資訊工程研究所 === 107 === Most of the current chatbot's dialogue design do not use a corpus because the cost is very high. When the user asks the chatbot some kind of related keywords, the chatbot usually directly responds with a dialogue answer which has existed in the database eve...

Full description

Bibliographic Details
Main Authors: He, Ying-Cheng, 何應承
Other Authors: Cheng, Chao-Jung
Format: Others
Language:zh-TW
Published: 2019
Online Access:http://ndltd.ncl.edu.tw/handle/dg2nh3
id ndltd-TW-107KSUT0392003
record_format oai_dc
spelling ndltd-TW-107KSUT03920032019-08-03T15:50:43Z http://ndltd.ncl.edu.tw/handle/dg2nh3 Development of the film discussion corpus based on retrieval generative architecture 基於檢索生成式架構之電影討論語料庫開發 He, Ying-Cheng 何應承 碩士 崑山科技大學 資訊工程研究所 107 Most of the current chatbot's dialogue design do not use a corpus because the cost is very high. When the user asks the chatbot some kind of related keywords, the chatbot usually directly responds with a dialogue answer which has existed in the database even if it is less attractive to consumers. In fact, if the stores can provide the customer service software to chat with the customers in anytime, they will be able to find out the user's preferences. In this paper, PTT movie bulletin board is used as a resource to carry out Natural Language Processing to obtain a film corpus. The web crawler is used to crawl the contents of the movie themes discussed by the netizens. The contents were first processed by the Jieba word-breaking algorithm to produce the film corpus. In order to improve the accuracy of the system, this paper combines the corpus of search and generative architecture. There are two modes. The search model is the default mode. When the user asks about the related movie theme derived from the PTT movie board discussion, the model question-and-answer pairing uses the BM25 applicability judgment method to determine whether to output the corresponding words in the search model corpus. If the condition of BM25 judgment is not met, the Seq2Seq model is adopted and the trained movie question answering module will provide the sentence derived from the generated corpus. In brief, the search-generation architecture dialogue system allows chatbots and users to discuss more movie knowledge topics interactively. Furthermore, compared to the old version of the chatbot in which Dialogflow and wit.ai custom modules are required to produce Q&A, this paper can reduce the setting process about the tedious work of intents and entities rules. Cheng, Chao-Jung 鄭朝榮 2019 學位論文 ; thesis 66 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 崑山科技大學 === 資訊工程研究所 === 107 === Most of the current chatbot's dialogue design do not use a corpus because the cost is very high. When the user asks the chatbot some kind of related keywords, the chatbot usually directly responds with a dialogue answer which has existed in the database even if it is less attractive to consumers. In fact, if the stores can provide the customer service software to chat with the customers in anytime, they will be able to find out the user's preferences. In this paper, PTT movie bulletin board is used as a resource to carry out Natural Language Processing to obtain a film corpus. The web crawler is used to crawl the contents of the movie themes discussed by the netizens. The contents were first processed by the Jieba word-breaking algorithm to produce the film corpus. In order to improve the accuracy of the system, this paper combines the corpus of search and generative architecture. There are two modes. The search model is the default mode. When the user asks about the related movie theme derived from the PTT movie board discussion, the model question-and-answer pairing uses the BM25 applicability judgment method to determine whether to output the corresponding words in the search model corpus. If the condition of BM25 judgment is not met, the Seq2Seq model is adopted and the trained movie question answering module will provide the sentence derived from the generated corpus. In brief, the search-generation architecture dialogue system allows chatbots and users to discuss more movie knowledge topics interactively. Furthermore, compared to the old version of the chatbot in which Dialogflow and wit.ai custom modules are required to produce Q&A, this paper can reduce the setting process about the tedious work of intents and entities rules.
author2 Cheng, Chao-Jung
author_facet Cheng, Chao-Jung
He, Ying-Cheng
何應承
author He, Ying-Cheng
何應承
spellingShingle He, Ying-Cheng
何應承
Development of the film discussion corpus based on retrieval generative architecture
author_sort He, Ying-Cheng
title Development of the film discussion corpus based on retrieval generative architecture
title_short Development of the film discussion corpus based on retrieval generative architecture
title_full Development of the film discussion corpus based on retrieval generative architecture
title_fullStr Development of the film discussion corpus based on retrieval generative architecture
title_full_unstemmed Development of the film discussion corpus based on retrieval generative architecture
title_sort development of the film discussion corpus based on retrieval generative architecture
publishDate 2019
url http://ndltd.ncl.edu.tw/handle/dg2nh3
work_keys_str_mv AT heyingcheng developmentofthefilmdiscussioncorpusbasedonretrievalgenerativearchitecture
AT héyīngchéng developmentofthefilmdiscussioncorpusbasedonretrievalgenerativearchitecture
AT heyingcheng jīyújiǎnsuǒshēngchéngshìjiàgòuzhīdiànyǐngtǎolùnyǔliàokùkāifā
AT héyīngchéng jīyújiǎnsuǒshēngchéngshìjiàgòuzhīdiànyǐngtǎolùnyǔliàokùkāifā
_version_ 1719232675009527808