A comparative study of deep learning based text classification

碩士 === 國立彰化師範大學 === 資訊工程學系 === 106 === Text mining is the process of categorizing and analyzing complicated data. The information we aim to acquire is ultimately presented in a simple way. Given the fact that data nowadays are enormous and complex, effective statistical analysis of data has become s...

Full description

Bibliographic Details
Main Author: 陳奕志
Other Authors: 施明毅
Format: Others
Language:zh-TW
Published: 2018
Online Access:http://ndltd.ncl.edu.tw/handle/h4sh7d
id ndltd-TW-106NCUE5392006
record_format oai_dc
spelling ndltd-TW-106NCUE53920062019-07-25T04:46:49Z http://ndltd.ncl.edu.tw/handle/h4sh7d A comparative study of deep learning based text classification 深度學習文本分類的比較研究 陳奕志 碩士 國立彰化師範大學 資訊工程學系 106 Text mining is the process of categorizing and analyzing complicated data. The information we aim to acquire is ultimately presented in a simple way. Given the fact that data nowadays are enormous and complex, effective statistical analysis of data has become such a challenge. As a consequence, increasing importance has been attached to in-depth text mining learning. Additionally, it has been widely employed. For instance, medical data which have been collected from the past can be analyzed for the purpose of researching on possible causes of certain diseases. Another example is that text mining can be applied to understand the demand, preference and expectation of customers. Based on the results, stores are capable of designing various marketing strategies to appeal to customers. My thesis focuses on the applying of text mining of the data mining to three types of Neural Network – Support Vector Machine(SVM), Convolutional Neural Networks(CNN) and Long Short-Term Memory (LSTM). It also combines the model of word2vec developed by the work team of Google. The word2vec model can be used to map each word to a vector and can be used to represent the relationship between word-to-word. We set up word2ve in the embedding layer, through the embedding layer to convert each word in the data into a set of vectors, and then substituting into the Neural Network architecture. CNN is a type of Neural Network, which has been effectively used to categorize pictures and gradually used to categorize texts as well. However, SVM had the best efficiency in categorization in the past. Therefore, my thesis aims to explore how models affect text IV categorization, differentiate among the three and determine which model is the best at accurate categorization. 施明毅 2018 學位論文 ; thesis 46 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立彰化師範大學 === 資訊工程學系 === 106 === Text mining is the process of categorizing and analyzing complicated data. The information we aim to acquire is ultimately presented in a simple way. Given the fact that data nowadays are enormous and complex, effective statistical analysis of data has become such a challenge. As a consequence, increasing importance has been attached to in-depth text mining learning. Additionally, it has been widely employed. For instance, medical data which have been collected from the past can be analyzed for the purpose of researching on possible causes of certain diseases. Another example is that text mining can be applied to understand the demand, preference and expectation of customers. Based on the results, stores are capable of designing various marketing strategies to appeal to customers. My thesis focuses on the applying of text mining of the data mining to three types of Neural Network – Support Vector Machine(SVM), Convolutional Neural Networks(CNN) and Long Short-Term Memory (LSTM). It also combines the model of word2vec developed by the work team of Google. The word2vec model can be used to map each word to a vector and can be used to represent the relationship between word-to-word. We set up word2ve in the embedding layer, through the embedding layer to convert each word in the data into a set of vectors, and then substituting into the Neural Network architecture. CNN is a type of Neural Network, which has been effectively used to categorize pictures and gradually used to categorize texts as well. However, SVM had the best efficiency in categorization in the past. Therefore, my thesis aims to explore how models affect text IV categorization, differentiate among the three and determine which model is the best at accurate categorization.
author2 施明毅
author_facet 施明毅
陳奕志
author 陳奕志
spellingShingle 陳奕志
A comparative study of deep learning based text classification
author_sort 陳奕志
title A comparative study of deep learning based text classification
title_short A comparative study of deep learning based text classification
title_full A comparative study of deep learning based text classification
title_fullStr A comparative study of deep learning based text classification
title_full_unstemmed A comparative study of deep learning based text classification
title_sort comparative study of deep learning based text classification
publishDate 2018
url http://ndltd.ncl.edu.tw/handle/h4sh7d
work_keys_str_mv AT chényìzhì acomparativestudyofdeeplearningbasedtextclassification
AT chényìzhì shēndùxuéxíwénběnfēnlèidebǐjiàoyánjiū
AT chényìzhì comparativestudyofdeeplearningbasedtextclassification
_version_ 1719230237213982720