A comparative study of deep learning based text classification
碩士 === 國立彰化師範大學 === 資訊工程學系 === 106 === Text mining is the process of categorizing and analyzing complicated data. The information we aim to acquire is ultimately presented in a simple way. Given the fact that data nowadays are enormous and complex, effective statistical analysis of data has become s...
Main Author: | |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2018
|
Online Access: | http://ndltd.ncl.edu.tw/handle/h4sh7d |
id |
ndltd-TW-106NCUE5392006 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-106NCUE53920062019-07-25T04:46:49Z http://ndltd.ncl.edu.tw/handle/h4sh7d A comparative study of deep learning based text classification 深度學習文本分類的比較研究 陳奕志 碩士 國立彰化師範大學 資訊工程學系 106 Text mining is the process of categorizing and analyzing complicated data. The information we aim to acquire is ultimately presented in a simple way. Given the fact that data nowadays are enormous and complex, effective statistical analysis of data has become such a challenge. As a consequence, increasing importance has been attached to in-depth text mining learning. Additionally, it has been widely employed. For instance, medical data which have been collected from the past can be analyzed for the purpose of researching on possible causes of certain diseases. Another example is that text mining can be applied to understand the demand, preference and expectation of customers. Based on the results, stores are capable of designing various marketing strategies to appeal to customers. My thesis focuses on the applying of text mining of the data mining to three types of Neural Network – Support Vector Machine(SVM), Convolutional Neural Networks(CNN) and Long Short-Term Memory (LSTM). It also combines the model of word2vec developed by the work team of Google. The word2vec model can be used to map each word to a vector and can be used to represent the relationship between word-to-word. We set up word2ve in the embedding layer, through the embedding layer to convert each word in the data into a set of vectors, and then substituting into the Neural Network architecture. CNN is a type of Neural Network, which has been effectively used to categorize pictures and gradually used to categorize texts as well. However, SVM had the best efficiency in categorization in the past. Therefore, my thesis aims to explore how models affect text IV categorization, differentiate among the three and determine which model is the best at accurate categorization. 施明毅 2018 學位論文 ; thesis 46 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立彰化師範大學 === 資訊工程學系 === 106 === Text mining is the process of categorizing and analyzing complicated data. The information we aim to acquire is ultimately presented in a simple way. Given the fact that data nowadays are enormous and complex, effective statistical analysis of data has become such a challenge. As a consequence, increasing importance has been attached to in-depth text mining learning. Additionally, it has been widely employed. For instance, medical data which have been collected from the past can be analyzed for the purpose of researching on possible causes of certain diseases. Another example is that text mining can be applied to understand the demand, preference and expectation of customers. Based on the results, stores are capable of designing various marketing strategies to appeal to customers.
My thesis focuses on the applying of text mining of the data mining to three types of Neural Network – Support Vector Machine(SVM), Convolutional Neural Networks(CNN) and Long Short-Term Memory (LSTM). It also combines the model of word2vec developed by the work team of Google. The word2vec model can be used to map each word to a vector and can be used to represent the relationship between word-to-word. We set up word2ve in the embedding layer, through the embedding layer to convert each word in the data into a set of vectors, and then substituting into the Neural Network architecture.
CNN is a type of Neural Network, which has been effectively used to categorize pictures and gradually used to categorize texts as well. However, SVM had the best efficiency in categorization in the past. Therefore, my thesis aims to explore how models affect text
IV
categorization, differentiate among the three and determine which model is the best at accurate categorization.
|
author2 |
施明毅 |
author_facet |
施明毅 陳奕志 |
author |
陳奕志 |
spellingShingle |
陳奕志 A comparative study of deep learning based text classification |
author_sort |
陳奕志 |
title |
A comparative study of deep learning based text classification |
title_short |
A comparative study of deep learning based text classification |
title_full |
A comparative study of deep learning based text classification |
title_fullStr |
A comparative study of deep learning based text classification |
title_full_unstemmed |
A comparative study of deep learning based text classification |
title_sort |
comparative study of deep learning based text classification |
publishDate |
2018 |
url |
http://ndltd.ncl.edu.tw/handle/h4sh7d |
work_keys_str_mv |
AT chényìzhì acomparativestudyofdeeplearningbasedtextclassification AT chényìzhì shēndùxuéxíwénběnfēnlèidebǐjiàoyánjiū AT chényìzhì comparativestudyofdeeplearningbasedtextclassification |
_version_ |
1719230237213982720 |