Summary: | 碩士 === 崑山科技大學 === 資訊工程研究所 === 107 === Most of the current chatbot's dialogue design do not use a corpus because the cost is very high. When the user asks the chatbot some kind of related keywords, the chatbot usually directly responds with a dialogue answer which has existed in the database even if it is less attractive to consumers. In fact, if the stores can provide the customer service software to chat with the customers in anytime, they will be able to find out the user's preferences. In this paper, PTT movie bulletin board is used as a resource to carry out Natural Language Processing to obtain a film corpus. The web crawler is used to crawl the contents of the movie themes discussed by the netizens. The contents were first processed by the Jieba word-breaking algorithm to produce the film corpus. In order to improve the accuracy of the system, this paper combines the corpus of search and generative architecture. There are two modes. The search model is the default mode. When the user asks about the related movie theme derived from the PTT movie board discussion, the model question-and-answer pairing uses the BM25 applicability judgment method to determine whether to output the corresponding words in the search model corpus. If the condition of BM25 judgment is not met, the Seq2Seq model is adopted and the trained movie question answering module will provide the sentence derived from the generated corpus. In brief, the search-generation architecture dialogue system allows chatbots and users to discuss more movie knowledge topics interactively. Furthermore, compared to the old version of the chatbot in which Dialogflow and wit.ai custom modules are required to produce Q&A, this paper can reduce the setting process about the tedious work of intents and entities rules.
|