A comparative study of deep learning based text classification

碩士 === 國立彰化師範大學 === 資訊工程學系 === 106 === Text mining is the process of categorizing and analyzing complicated data. The information we aim to acquire is ultimately presented in a simple way. Given the fact that data nowadays are enormous and complex, effective statistical analysis of data has become s...

Full description

Bibliographic Details
Main Author:	陳奕志
Other Authors:	施明毅
Format:	Others
Language:	zh-TW
Published:	2018
Online Access:	http://ndltd.ncl.edu.tw/handle/h4sh7d

id	ndltd-TW-106NCUE5392006
record_format	oai_dc
spelling	ndltd-TW-106NCUE53920062019-07-25T04:46:49Z http://ndltd.ncl.edu.tw/handle/h4sh7d A comparative study of deep learning based text classification 深度學習文本分類的比較研究陳奕志碩士國立彰化師範大學資訊工程學系 106 Text mining is the process of categorizing and analyzing complicated data. The information we aim to acquire is ultimately presented in a simple way. Given the fact that data nowadays are enormous and complex, effective statistical analysis of data has become such a challenge. As a consequence, increasing importance has been attached to in-depth text mining learning. Additionally, it has been widely employed. For instance, medical data which have been collected from the past can be analyzed for the purpose of researching on possible causes of certain diseases. Another example is that text mining can be applied to understand the demand, preference and expectation of customers. Based on the results, stores are capable of designing various marketing strategies to appeal to customers. My thesis focuses on the applying of text mining of the data mining to three types of Neural Network – Support Vector Machine(SVM), Convolutional Neural Networks(CNN) and Long Short-Term Memory (LSTM). It also combines the model of word2vec developed by the work team of Google. The word2vec model can be used to map each word to a vector and can be used to represent the relationship between word-to-word. We set up word2ve in the embedding layer, through the embedding layer to convert each word in the data into a set of vectors, and then substituting into the Neural Network architecture. CNN is a type of Neural Network, which has been effectively used to categorize pictures and gradually used to categorize texts as well. However, SVM had the best efficiency in categorization in the past. Therefore, my thesis aims to explore how models affect text IV categorization, differentiate among the three and determine which model is the best at accurate categorization. 施明毅 2018 學位論文 ; thesis 46 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立彰化師範大學 === 資訊工程學系 === 106 === Text mining is the process of categorizing and analyzing complicated data. The information we aim to acquire is ultimately presented in a simple way. Given the fact that data nowadays are enormous and complex, effective statistical analysis of data has become such a challenge. As a consequence, increasing importance has been attached to in-depth text mining learning. Additionally, it has been widely employed. For instance, medical data which have been collected from the past can be analyzed for the purpose of researching on possible causes of certain diseases. Another example is that text mining can be applied to understand the demand, preference and expectation of customers. Based on the results, stores are capable of designing various marketing strategies to appeal to customers. My thesis focuses on the applying of text mining of the data mining to three types of Neural Network – Support Vector Machine(SVM), Convolutional Neural Networks(CNN) and Long Short-Term Memory (LSTM). It also combines the model of word2vec developed by the work team of Google. The word2vec model can be used to map each word to a vector and can be used to represent the relationship between word-to-word. We set up word2ve in the embedding layer, through the embedding layer to convert each word in the data into a set of vectors, and then substituting into the Neural Network architecture. CNN is a type of Neural Network, which has been effectively used to categorize pictures and gradually used to categorize texts as well. However, SVM had the best efficiency in categorization in the past. Therefore, my thesis aims to explore how models affect text IV categorization, differentiate among the three and determine which model is the best at accurate categorization.
author2	施明毅
author_facet	施明毅陳奕志
author	陳奕志
spellingShingle	陳奕志 A comparative study of deep learning based text classification
author_sort	陳奕志
title	A comparative study of deep learning based text classification
title_short	A comparative study of deep learning based text classification
title_full	A comparative study of deep learning based text classification
title_fullStr	A comparative study of deep learning based text classification
title_full_unstemmed	A comparative study of deep learning based text classification
title_sort	comparative study of deep learning based text classification
publishDate	2018
url	http://ndltd.ncl.edu.tw/handle/h4sh7d
work_keys_str_mv	AT chényìzhì acomparativestudyofdeeplearningbasedtextclassification AT chényìzhì shēndùxuéxíwénběnfēnlèidebǐjiàoyánjiū AT chényìzhì comparativestudyofdeeplearningbasedtextclassification
_version_	1719230237213982720

A comparative study of deep learning based text classification

Similar Items