An effective and efficient approach for manually improving geocoded data

<p>Abstract</p> <p>Background</p> <p>The process of geocoding produces output coordinates of varying degrees of quality. Previous studies have revealed that simply excluding records with low-quality geocodes from analysis can introduce significant bias, but depending on...

Full description

Bibliographic Details
Main Authors: Knoblock Craig A, Wilson John P, Goldberg Daniel W, Ritz Beate, Cockburn Myles G
Format: Article
Language:English
Published: BMC 2008-11-01
Series:International Journal of Health Geographics
Online Access:http://www.ij-healthgeographics.com/content/7/1/60
id doaj-663c4c7f6ca5476ca7eee28658cb77dd
record_format Article
spelling doaj-663c4c7f6ca5476ca7eee28658cb77dd2020-11-24T22:20:05ZengBMCInternational Journal of Health Geographics1476-072X2008-11-01716010.1186/1476-072X-7-60An effective and efficient approach for manually improving geocoded dataKnoblock Craig AWilson John PGoldberg Daniel WRitz BeateCockburn Myles G<p>Abstract</p> <p>Background</p> <p>The process of geocoding produces output coordinates of varying degrees of quality. Previous studies have revealed that simply excluding records with low-quality geocodes from analysis can introduce significant bias, but depending on the number and severity of the inaccuracies, their inclusion may also lead to bias. Little quantitative research has been presented on the cost and/or effectiveness of correcting geocodes through manual interactive processes, so the most cost effective methods for improving geocoded data are unclear. The present work investigates the time and effort required to correct geocodes contained in five health-related datasets that represent examples of data commonly used in Health GIS.</p> <p>Results</p> <p>Geocode correction was attempted on five health-related datasets containing a total of 22,317 records. The complete processing of these data took 11.4 weeks (427 hours), averaging 69 seconds of processing time per record. Overall, the geocodes associated with 12,280 (55%) of records were successfully improved, taking 95 seconds of processing time per corrected record on average across all five datasets. Geocode correction improved the overall match rate (the number of successful matches out of the total attempted) from 79.3 to 95%. The spatial shift between the location of original successfully matched geocodes and their corrected improved counterparts averaged 9.9 km per corrected record. After geocode correction the number of city and USPS ZIP code accuracy geocodes were reduced from 10,959 and 1,031 to 6,284 and 200, respectively, while the number of building centroid accuracy geocodes increased from 0 to 2,261.</p> <p>Conclusion</p> <p>The results indicate that manual geocode correction using a web-based interactive approach is a feasible and cost effective method for improving the quality of geocoded data. The level of effort required varies depending on the type of data geocoded. These results can be used to choose between data improvement options (e.g., manual intervention, pseudocoding/geo-imputation, field GPS readings).</p> http://www.ij-healthgeographics.com/content/7/1/60
collection DOAJ
language English
format Article
sources DOAJ
author Knoblock Craig A
Wilson John P
Goldberg Daniel W
Ritz Beate
Cockburn Myles G
spellingShingle Knoblock Craig A
Wilson John P
Goldberg Daniel W
Ritz Beate
Cockburn Myles G
An effective and efficient approach for manually improving geocoded data
International Journal of Health Geographics
author_facet Knoblock Craig A
Wilson John P
Goldberg Daniel W
Ritz Beate
Cockburn Myles G
author_sort Knoblock Craig A
title An effective and efficient approach for manually improving geocoded data
title_short An effective and efficient approach for manually improving geocoded data
title_full An effective and efficient approach for manually improving geocoded data
title_fullStr An effective and efficient approach for manually improving geocoded data
title_full_unstemmed An effective and efficient approach for manually improving geocoded data
title_sort effective and efficient approach for manually improving geocoded data
publisher BMC
series International Journal of Health Geographics
issn 1476-072X
publishDate 2008-11-01
description <p>Abstract</p> <p>Background</p> <p>The process of geocoding produces output coordinates of varying degrees of quality. Previous studies have revealed that simply excluding records with low-quality geocodes from analysis can introduce significant bias, but depending on the number and severity of the inaccuracies, their inclusion may also lead to bias. Little quantitative research has been presented on the cost and/or effectiveness of correcting geocodes through manual interactive processes, so the most cost effective methods for improving geocoded data are unclear. The present work investigates the time and effort required to correct geocodes contained in five health-related datasets that represent examples of data commonly used in Health GIS.</p> <p>Results</p> <p>Geocode correction was attempted on five health-related datasets containing a total of 22,317 records. The complete processing of these data took 11.4 weeks (427 hours), averaging 69 seconds of processing time per record. Overall, the geocodes associated with 12,280 (55%) of records were successfully improved, taking 95 seconds of processing time per corrected record on average across all five datasets. Geocode correction improved the overall match rate (the number of successful matches out of the total attempted) from 79.3 to 95%. The spatial shift between the location of original successfully matched geocodes and their corrected improved counterparts averaged 9.9 km per corrected record. After geocode correction the number of city and USPS ZIP code accuracy geocodes were reduced from 10,959 and 1,031 to 6,284 and 200, respectively, while the number of building centroid accuracy geocodes increased from 0 to 2,261.</p> <p>Conclusion</p> <p>The results indicate that manual geocode correction using a web-based interactive approach is a feasible and cost effective method for improving the quality of geocoded data. The level of effort required varies depending on the type of data geocoded. These results can be used to choose between data improvement options (e.g., manual intervention, pseudocoding/geo-imputation, field GPS readings).</p>
url http://www.ij-healthgeographics.com/content/7/1/60
work_keys_str_mv AT knoblockcraiga aneffectiveandefficientapproachformanuallyimprovinggeocodeddata
AT wilsonjohnp aneffectiveandefficientapproachformanuallyimprovinggeocodeddata
AT goldbergdanielw aneffectiveandefficientapproachformanuallyimprovinggeocodeddata
AT ritzbeate aneffectiveandefficientapproachformanuallyimprovinggeocodeddata
AT cockburnmylesg aneffectiveandefficientapproachformanuallyimprovinggeocodeddata
AT knoblockcraiga effectiveandefficientapproachformanuallyimprovinggeocodeddata
AT wilsonjohnp effectiveandefficientapproachformanuallyimprovinggeocodeddata
AT goldbergdanielw effectiveandefficientapproachformanuallyimprovinggeocodeddata
AT ritzbeate effectiveandefficientapproachformanuallyimprovinggeocodeddata
AT cockburnmylesg effectiveandefficientapproachformanuallyimprovinggeocodeddata
_version_ 1725776985695715328