Data quality research of industry and commerce census

碩士 === 國立政治大學 === 統計研究所 === 99 === Data quality is good or bad will affect the decision quality and achievements in the implementation of various actions, so the data quality more and more attention in recent years. This study consists of two databases, one is the industrial innovation survey databa...

Full description

Bibliographic Details
Main Author: 邱詠翔
Other Authors: 鄭宇庭
Format: Others
Language:zh-TW
Online Access:http://ndltd.ncl.edu.tw/handle/76210628659848749970
id ndltd-TW-099NCCU5337007
record_format oai_dc
spelling ndltd-TW-099NCCU53370072015-10-28T04:06:49Z http://ndltd.ncl.edu.tw/handle/76210628659848749970 Data quality research of industry and commerce census 工商及服務業普查資料品質之研究 邱詠翔 碩士 國立政治大學 統計研究所 99 Data quality is good or bad will affect the decision quality and achievements in the implementation of various actions, so the data quality more and more attention in recent years. This study consists of two databases, one is the industrial innovation survey database, another is the industry and commerce census database in ninety five years. Data quality is good or bad of a database is also a very important issue, the database often contain erroneous information, incorrect information will result in bias of the analysis results. So before carrying out data analysis, data cleaning and consolidation is necessary. We can know from the parent and the sample data distribution. Before data cleaning and consolidation, the average number of innovation employees is 92.08, and the average number of industrial-commerce employees is 135.54. After data cleaning and consolidation, we compare the correlation, similarity, and distance of the number of employees in two databases. The results show the data consistency of the two databases is very high, the average number of innovation employees is 39.01, and the average number of industrial-commerce employees is 42.12, it is closer to the average number of parent employees 7.05. This also shows the importance of data cleaning. Method used in the study is post-stratified sampling, the main research objective is to use industrial innovation survey sample to estimate the data accuracy of the industry and commerce census in ninety five years. Use industrial innovation survey sample to estimate the number of employees and operating revenue in the industry and commerce census in ninety five years are both overestimated, we guess the reason is that the parent of the industrial innovation survey is five thousand large enterprises published by China Credit Information, and the parent of the industry and commerce census is general enterprises. Therefore, we use the corresponding industry and commerce census sample for validation. The results show that the data consistency of the industrial innovation survey sample and the industry and commerce census sample in ninety five years is very high. 鄭宇庭 蔡紋琦 學位論文 ; thesis 72 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立政治大學 === 統計研究所 === 99 === Data quality is good or bad will affect the decision quality and achievements in the implementation of various actions, so the data quality more and more attention in recent years. This study consists of two databases, one is the industrial innovation survey database, another is the industry and commerce census database in ninety five years. Data quality is good or bad of a database is also a very important issue, the database often contain erroneous information, incorrect information will result in bias of the analysis results. So before carrying out data analysis, data cleaning and consolidation is necessary. We can know from the parent and the sample data distribution. Before data cleaning and consolidation, the average number of innovation employees is 92.08, and the average number of industrial-commerce employees is 135.54. After data cleaning and consolidation, we compare the correlation, similarity, and distance of the number of employees in two databases. The results show the data consistency of the two databases is very high, the average number of innovation employees is 39.01, and the average number of industrial-commerce employees is 42.12, it is closer to the average number of parent employees 7.05. This also shows the importance of data cleaning. Method used in the study is post-stratified sampling, the main research objective is to use industrial innovation survey sample to estimate the data accuracy of the industry and commerce census in ninety five years. Use industrial innovation survey sample to estimate the number of employees and operating revenue in the industry and commerce census in ninety five years are both overestimated, we guess the reason is that the parent of the industrial innovation survey is five thousand large enterprises published by China Credit Information, and the parent of the industry and commerce census is general enterprises. Therefore, we use the corresponding industry and commerce census sample for validation. The results show that the data consistency of the industrial innovation survey sample and the industry and commerce census sample in ninety five years is very high.
author2 鄭宇庭
author_facet 鄭宇庭
邱詠翔
author 邱詠翔
spellingShingle 邱詠翔
Data quality research of industry and commerce census
author_sort 邱詠翔
title Data quality research of industry and commerce census
title_short Data quality research of industry and commerce census
title_full Data quality research of industry and commerce census
title_fullStr Data quality research of industry and commerce census
title_full_unstemmed Data quality research of industry and commerce census
title_sort data quality research of industry and commerce census
url http://ndltd.ncl.edu.tw/handle/76210628659848749970
work_keys_str_mv AT qiūyǒngxiáng dataqualityresearchofindustryandcommercecensus
AT qiūyǒngxiáng gōngshāngjífúwùyèpǔcházīliàopǐnzhìzhīyánjiū
_version_ 1718112788216283136