The comparison of decision tree and discriminant analysis for rice data

碩士 === 中興大學 === 農藝學系所 === 94 === It is usually time-consuming, labor-intensive and costly to collect and analyze data. Due to the vast advancement of the technology of data retrieval and storage, the amount of data in the databases and data warehouses increases rapidly. Although much useful informat...

Full description

Bibliographic Details
Main Authors: Yen-Ting Chen, 陳嬿婷
Other Authors: Bo-Jein Kuo
Format: Others
Language:zh-TW
Published: 2006
Online Access:http://ndltd.ncl.edu.tw/handle/66601533806238606686
id ndltd-TW-094NCHU5417017
record_format oai_dc
spelling ndltd-TW-094NCHU54170172015-10-13T16:41:01Z http://ndltd.ncl.edu.tw/handle/66601533806238606686 The comparison of decision tree and discriminant analysis for rice data 決策樹與判別分析在稻米品質資料上的比較 Yen-Ting Chen 陳嬿婷 碩士 中興大學 農藝學系所 94 It is usually time-consuming, labor-intensive and costly to collect and analyze data. Due to the vast advancement of the technology of data retrieval and storage, the amount of data in the databases and data warehouses increases rapidly. Although much useful information is contained in the data, the effective tools which can find useful information are very few. Therefore, some scholars developed procedures of data mining, which can be used to extract useful information from great amounts of data. In data analysis, the classification methods used most frequently include discriminant analysis, logistic regression, decision tree, and neural networks. However, these methods must be used in the situation where the classes were known. This paper analyzed 250 items of rice data for the first and the second crops of japonica and indica rice. The physicochemical property, viscosity, and panel test score from 2001 to 2004 were investigated. Both discriminant analysis and C4.5 decision tree algorithm were used to classify and predict the taste quality of cooked rice. The misclassification rate of decision tree was lower than that of discriminant analysis. Using these two methods, the data of the panel test score for rice was classified and predicted. Between the two methods, the misclassification rate of decision tree was lower. To avoid the problem of overfitting, the decision tree was pruned. The results showed that when we employed the protein content as initial split node, the misclassification rate was lower. It was also showed that when the tree was pruned, the misclassification rate of calibration was lowered and the number of nodes was decreased. Therefore, the pruned decision tree was better than the original one. Moreover, the principal component analysis revealed that the physicochemical property of rice was a more important factor affecting the eating quality of rice than viscosity. In summary, this research indicated that between two different methods of classifying the data of the panel test score for rice, the decision tree was a better method than the discriminant analysis. Bo-Jein Kuo 郭寶錚 2006 學位論文 ; thesis 76 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 中興大學 === 農藝學系所 === 94 === It is usually time-consuming, labor-intensive and costly to collect and analyze data. Due to the vast advancement of the technology of data retrieval and storage, the amount of data in the databases and data warehouses increases rapidly. Although much useful information is contained in the data, the effective tools which can find useful information are very few. Therefore, some scholars developed procedures of data mining, which can be used to extract useful information from great amounts of data. In data analysis, the classification methods used most frequently include discriminant analysis, logistic regression, decision tree, and neural networks. However, these methods must be used in the situation where the classes were known. This paper analyzed 250 items of rice data for the first and the second crops of japonica and indica rice. The physicochemical property, viscosity, and panel test score from 2001 to 2004 were investigated. Both discriminant analysis and C4.5 decision tree algorithm were used to classify and predict the taste quality of cooked rice. The misclassification rate of decision tree was lower than that of discriminant analysis. Using these two methods, the data of the panel test score for rice was classified and predicted. Between the two methods, the misclassification rate of decision tree was lower. To avoid the problem of overfitting, the decision tree was pruned. The results showed that when we employed the protein content as initial split node, the misclassification rate was lower. It was also showed that when the tree was pruned, the misclassification rate of calibration was lowered and the number of nodes was decreased. Therefore, the pruned decision tree was better than the original one. Moreover, the principal component analysis revealed that the physicochemical property of rice was a more important factor affecting the eating quality of rice than viscosity. In summary, this research indicated that between two different methods of classifying the data of the panel test score for rice, the decision tree was a better method than the discriminant analysis.
author2 Bo-Jein Kuo
author_facet Bo-Jein Kuo
Yen-Ting Chen
陳嬿婷
author Yen-Ting Chen
陳嬿婷
spellingShingle Yen-Ting Chen
陳嬿婷
The comparison of decision tree and discriminant analysis for rice data
author_sort Yen-Ting Chen
title The comparison of decision tree and discriminant analysis for rice data
title_short The comparison of decision tree and discriminant analysis for rice data
title_full The comparison of decision tree and discriminant analysis for rice data
title_fullStr The comparison of decision tree and discriminant analysis for rice data
title_full_unstemmed The comparison of decision tree and discriminant analysis for rice data
title_sort comparison of decision tree and discriminant analysis for rice data
publishDate 2006
url http://ndltd.ncl.edu.tw/handle/66601533806238606686
work_keys_str_mv AT yentingchen thecomparisonofdecisiontreeanddiscriminantanalysisforricedata
AT chényàntíng thecomparisonofdecisiontreeanddiscriminantanalysisforricedata
AT yentingchen juécèshùyǔpànbiéfēnxīzàidàomǐpǐnzhìzīliàoshàngdebǐjiào
AT chényàntíng juécèshùyǔpànbiéfēnxīzàidàomǐpǐnzhìzīliàoshàngdebǐjiào
AT yentingchen comparisonofdecisiontreeanddiscriminantanalysisforricedata
AT chényàntíng comparisonofdecisiontreeanddiscriminantanalysisforricedata
_version_ 1717772779911118848