Record Linkage Methodology for the Social Data Linkage Environment at Statistics Canada

ABSTRACT Objectives The objectives of this talk are to introduce Statistics Canada’s Social Data Linkage Environment (SDLE) and to explain the methodology behind the creation of the central depository and how both deterministic and probabilistic record linkage techniques are used to maintain and ex...

Full description

Bibliographic Details
Main Authors: Colin Babyak, Abdelnasser Saidi
Format: Article
Language:English
Published: Swansea University 2017-04-01
Series:International Journal of Population Data Science
Online Access:https://ijpds.org/article/view/49
id doaj-e6991d67066f404ba6c357a9268dd445
record_format Article
spelling doaj-e6991d67066f404ba6c357a9268dd4452020-11-24T22:08:50ZengSwansea UniversityInternational Journal of Population Data Science2399-49082017-04-011110.23889/ijpds.v1i1.4949Record Linkage Methodology for the Social Data Linkage Environment at Statistics CanadaColin Babyak0Abdelnasser SaidiStatistics CanadaABSTRACT Objectives The objectives of this talk are to introduce Statistics Canada’s Social Data Linkage Environment (SDLE) and to explain the methodology behind the creation of the central depository and how both deterministic and probabilistic record linkage techniques are used to maintain and expand the environment. Approach We will start with a brief overview of the SDLE and then continue with a discussion of how both deterministic linkages and probabilistic linkages (using Statistic Canada’s generalized record linkage software, G-Link) have been combined to create and maintain a very large central depository, which can in turn be linked to virtually any social data source for the ultimate end goal of analysis. Results Although Canada has a population of about 36 million people, the central depository contains some 300 million records to represent them, due to multiple addresses, names, etc. Although this allows for a significant reduction in missing links, it raises the spectre of additional false positive matches and has added computational complexity which we have had to overcome. Conclusion The combination of deterministic and probabilistic record linkage strategies has been effective in creating the central depository for the SDLE. As more and more data are linked to the environment and we continue to refine our methodology, we can now move on to the ultimate goal of the SDLE, which is to analyze this vast wealth of linked data.https://ijpds.org/article/view/49
collection DOAJ
language English
format Article
sources DOAJ
author Colin Babyak
Abdelnasser Saidi
spellingShingle Colin Babyak
Abdelnasser Saidi
Record Linkage Methodology for the Social Data Linkage Environment at Statistics Canada
International Journal of Population Data Science
author_facet Colin Babyak
Abdelnasser Saidi
author_sort Colin Babyak
title Record Linkage Methodology for the Social Data Linkage Environment at Statistics Canada
title_short Record Linkage Methodology for the Social Data Linkage Environment at Statistics Canada
title_full Record Linkage Methodology for the Social Data Linkage Environment at Statistics Canada
title_fullStr Record Linkage Methodology for the Social Data Linkage Environment at Statistics Canada
title_full_unstemmed Record Linkage Methodology for the Social Data Linkage Environment at Statistics Canada
title_sort record linkage methodology for the social data linkage environment at statistics canada
publisher Swansea University
series International Journal of Population Data Science
issn 2399-4908
publishDate 2017-04-01
description ABSTRACT Objectives The objectives of this talk are to introduce Statistics Canada’s Social Data Linkage Environment (SDLE) and to explain the methodology behind the creation of the central depository and how both deterministic and probabilistic record linkage techniques are used to maintain and expand the environment. Approach We will start with a brief overview of the SDLE and then continue with a discussion of how both deterministic linkages and probabilistic linkages (using Statistic Canada’s generalized record linkage software, G-Link) have been combined to create and maintain a very large central depository, which can in turn be linked to virtually any social data source for the ultimate end goal of analysis. Results Although Canada has a population of about 36 million people, the central depository contains some 300 million records to represent them, due to multiple addresses, names, etc. Although this allows for a significant reduction in missing links, it raises the spectre of additional false positive matches and has added computational complexity which we have had to overcome. Conclusion The combination of deterministic and probabilistic record linkage strategies has been effective in creating the central depository for the SDLE. As more and more data are linked to the environment and we continue to refine our methodology, we can now move on to the ultimate goal of the SDLE, which is to analyze this vast wealth of linked data.
url https://ijpds.org/article/view/49
work_keys_str_mv AT colinbabyak recordlinkagemethodologyforthesocialdatalinkageenvironmentatstatisticscanada
AT abdelnassersaidi recordlinkagemethodologyforthesocialdatalinkageenvironmentatstatisticscanada
_version_ 1725814493727948800