The comparison of decision tree and discriminant analysis for rice data
碩士 === 中興大學 === 農藝學系所 === 94 === It is usually time-consuming, labor-intensive and costly to collect and analyze data. Due to the vast advancement of the technology of data retrieval and storage, the amount of data in the databases and data warehouses increases rapidly. Although much useful informat...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2006
|
Online Access: | http://ndltd.ncl.edu.tw/handle/66601533806238606686 |
id |
ndltd-TW-094NCHU5417017 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-094NCHU54170172015-10-13T16:41:01Z http://ndltd.ncl.edu.tw/handle/66601533806238606686 The comparison of decision tree and discriminant analysis for rice data 決策樹與判別分析在稻米品質資料上的比較 Yen-Ting Chen 陳嬿婷 碩士 中興大學 農藝學系所 94 It is usually time-consuming, labor-intensive and costly to collect and analyze data. Due to the vast advancement of the technology of data retrieval and storage, the amount of data in the databases and data warehouses increases rapidly. Although much useful information is contained in the data, the effective tools which can find useful information are very few. Therefore, some scholars developed procedures of data mining, which can be used to extract useful information from great amounts of data. In data analysis, the classification methods used most frequently include discriminant analysis, logistic regression, decision tree, and neural networks. However, these methods must be used in the situation where the classes were known. This paper analyzed 250 items of rice data for the first and the second crops of japonica and indica rice. The physicochemical property, viscosity, and panel test score from 2001 to 2004 were investigated. Both discriminant analysis and C4.5 decision tree algorithm were used to classify and predict the taste quality of cooked rice. The misclassification rate of decision tree was lower than that of discriminant analysis. Using these two methods, the data of the panel test score for rice was classified and predicted. Between the two methods, the misclassification rate of decision tree was lower. To avoid the problem of overfitting, the decision tree was pruned. The results showed that when we employed the protein content as initial split node, the misclassification rate was lower. It was also showed that when the tree was pruned, the misclassification rate of calibration was lowered and the number of nodes was decreased. Therefore, the pruned decision tree was better than the original one. Moreover, the principal component analysis revealed that the physicochemical property of rice was a more important factor affecting the eating quality of rice than viscosity. In summary, this research indicated that between two different methods of classifying the data of the panel test score for rice, the decision tree was a better method than the discriminant analysis. Bo-Jein Kuo 郭寶錚 2006 學位論文 ; thesis 76 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 中興大學 === 農藝學系所 === 94 === It is usually time-consuming, labor-intensive and costly to collect and analyze data. Due to the vast advancement of the technology of data retrieval and storage, the amount of data in the databases and data warehouses increases rapidly. Although much useful information is contained in the data, the effective tools which can find useful information are very few. Therefore, some scholars developed procedures of data mining, which can be used to extract useful information from great amounts of data. In data analysis, the classification methods used most frequently include discriminant analysis, logistic regression, decision tree, and neural networks. However, these methods must be used in the situation where the classes were known.
This paper analyzed 250 items of rice data for the first and the second crops of japonica and indica rice. The physicochemical property, viscosity, and panel test score from 2001 to 2004 were investigated. Both discriminant analysis and C4.5 decision tree algorithm were used to classify and predict the taste quality of cooked rice. The misclassification rate of decision tree was lower than that of discriminant analysis. Using these two methods, the data of the panel test score for rice was classified and predicted. Between the two methods, the misclassification rate of decision tree was lower.
To avoid the problem of overfitting, the decision tree was pruned. The results showed that when we employed the protein content as initial split node, the misclassification rate was lower. It was also showed that when the tree was pruned, the misclassification rate of calibration was lowered and the number of nodes was decreased. Therefore, the pruned decision tree was better than the original one. Moreover, the principal component analysis revealed that the physicochemical property of rice was a more important factor affecting the eating quality of rice than viscosity. In summary, this research indicated that between two different methods of classifying the data of the panel test score for rice, the decision tree was a better method than the discriminant analysis.
|
author2 |
Bo-Jein Kuo |
author_facet |
Bo-Jein Kuo Yen-Ting Chen 陳嬿婷 |
author |
Yen-Ting Chen 陳嬿婷 |
spellingShingle |
Yen-Ting Chen 陳嬿婷 The comparison of decision tree and discriminant analysis for rice data |
author_sort |
Yen-Ting Chen |
title |
The comparison of decision tree and discriminant analysis for rice data |
title_short |
The comparison of decision tree and discriminant analysis for rice data |
title_full |
The comparison of decision tree and discriminant analysis for rice data |
title_fullStr |
The comparison of decision tree and discriminant analysis for rice data |
title_full_unstemmed |
The comparison of decision tree and discriminant analysis for rice data |
title_sort |
comparison of decision tree and discriminant analysis for rice data |
publishDate |
2006 |
url |
http://ndltd.ncl.edu.tw/handle/66601533806238606686 |
work_keys_str_mv |
AT yentingchen thecomparisonofdecisiontreeanddiscriminantanalysisforricedata AT chényàntíng thecomparisonofdecisiontreeanddiscriminantanalysisforricedata AT yentingchen juécèshùyǔpànbiéfēnxīzàidàomǐpǐnzhìzīliàoshàngdebǐjiào AT chényàntíng juécèshùyǔpànbiéfēnxīzàidàomǐpǐnzhìzīliàoshàngdebǐjiào AT yentingchen comparisonofdecisiontreeanddiscriminantanalysisforricedata AT chényàntíng comparisonofdecisiontreeanddiscriminantanalysisforricedata |
_version_ |
1717772779911118848 |