Data quality research of industry and commerce census
碩士 === 國立政治大學 === 統計研究所 === 99 === Data quality is good or bad will affect the decision quality and achievements in the implementation of various actions, so the data quality more and more attention in recent years. This study consists of two databases, one is the industrial innovation survey databa...
Main Author: | |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Online Access: | http://ndltd.ncl.edu.tw/handle/76210628659848749970 |
id |
ndltd-TW-099NCCU5337007 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-099NCCU53370072015-10-28T04:06:49Z http://ndltd.ncl.edu.tw/handle/76210628659848749970 Data quality research of industry and commerce census 工商及服務業普查資料品質之研究 邱詠翔 碩士 國立政治大學 統計研究所 99 Data quality is good or bad will affect the decision quality and achievements in the implementation of various actions, so the data quality more and more attention in recent years. This study consists of two databases, one is the industrial innovation survey database, another is the industry and commerce census database in ninety five years. Data quality is good or bad of a database is also a very important issue, the database often contain erroneous information, incorrect information will result in bias of the analysis results. So before carrying out data analysis, data cleaning and consolidation is necessary. We can know from the parent and the sample data distribution. Before data cleaning and consolidation, the average number of innovation employees is 92.08, and the average number of industrial-commerce employees is 135.54. After data cleaning and consolidation, we compare the correlation, similarity, and distance of the number of employees in two databases. The results show the data consistency of the two databases is very high, the average number of innovation employees is 39.01, and the average number of industrial-commerce employees is 42.12, it is closer to the average number of parent employees 7.05. This also shows the importance of data cleaning. Method used in the study is post-stratified sampling, the main research objective is to use industrial innovation survey sample to estimate the data accuracy of the industry and commerce census in ninety five years. Use industrial innovation survey sample to estimate the number of employees and operating revenue in the industry and commerce census in ninety five years are both overestimated, we guess the reason is that the parent of the industrial innovation survey is five thousand large enterprises published by China Credit Information, and the parent of the industry and commerce census is general enterprises. Therefore, we use the corresponding industry and commerce census sample for validation. The results show that the data consistency of the industrial innovation survey sample and the industry and commerce census sample in ninety five years is very high. 鄭宇庭 蔡紋琦 學位論文 ; thesis 72 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立政治大學 === 統計研究所 === 99 === Data quality is good or bad will affect the decision quality and achievements in the implementation of various actions, so the data quality more and more attention in recent years. This study consists of two databases, one is the industrial innovation survey database, another is the industry and commerce census database in ninety five years. Data quality is good or bad of a database is also a very important issue, the database often contain erroneous information, incorrect information will result in bias of the analysis results. So before carrying out data analysis, data cleaning and consolidation is necessary.
We can know from the parent and the sample data distribution. Before data cleaning and consolidation, the average number of innovation employees is 92.08, and the average number of industrial-commerce employees is 135.54. After data cleaning and consolidation, we compare the correlation, similarity, and distance of the number of employees in two databases. The results show the data consistency of the two databases is very high, the average number of innovation employees is 39.01, and the average number of industrial-commerce employees is 42.12, it is closer to the average number of parent employees 7.05. This also shows the importance of data cleaning.
Method used in the study is post-stratified sampling, the main research objective is to use industrial innovation survey sample to estimate the data accuracy of the industry and commerce census in ninety five years. Use industrial innovation survey sample to estimate the number of employees and operating revenue in the industry and commerce census in ninety five years are both overestimated, we guess the reason is that the parent of the industrial innovation survey is five thousand large enterprises published by China Credit Information, and the parent of the industry and commerce census is general enterprises. Therefore, we use the corresponding industry and commerce census sample for validation. The results show that the data consistency of the industrial innovation survey sample and the industry and commerce census sample in ninety five years is very high.
|
author2 |
鄭宇庭 |
author_facet |
鄭宇庭 邱詠翔 |
author |
邱詠翔 |
spellingShingle |
邱詠翔 Data quality research of industry and commerce census |
author_sort |
邱詠翔 |
title |
Data quality research of industry and commerce census |
title_short |
Data quality research of industry and commerce census |
title_full |
Data quality research of industry and commerce census |
title_fullStr |
Data quality research of industry and commerce census |
title_full_unstemmed |
Data quality research of industry and commerce census |
title_sort |
data quality research of industry and commerce census |
url |
http://ndltd.ncl.edu.tw/handle/76210628659848749970 |
work_keys_str_mv |
AT qiūyǒngxiáng dataqualityresearchofindustryandcommercecensus AT qiūyǒngxiáng gōngshāngjífúwùyèpǔcházīliàopǐnzhìzhīyánjiū |
_version_ |
1718112788216283136 |