Record Linkage Methodology for the Social Data Linkage Environment at Statistics Canada
ABSTRACT Objectives The objectives of this talk are to introduce Statistics Canada’s Social Data Linkage Environment (SDLE) and to explain the methodology behind the creation of the central depository and how both deterministic and probabilistic record linkage techniques are used to maintain and ex...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Swansea University
2017-04-01
|
Series: | International Journal of Population Data Science |
Online Access: | https://ijpds.org/article/view/49 |
id |
doaj-e6991d67066f404ba6c357a9268dd445 |
---|---|
record_format |
Article |
spelling |
doaj-e6991d67066f404ba6c357a9268dd4452020-11-24T22:08:50ZengSwansea UniversityInternational Journal of Population Data Science2399-49082017-04-011110.23889/ijpds.v1i1.4949Record Linkage Methodology for the Social Data Linkage Environment at Statistics CanadaColin Babyak0Abdelnasser SaidiStatistics CanadaABSTRACT Objectives The objectives of this talk are to introduce Statistics Canada’s Social Data Linkage Environment (SDLE) and to explain the methodology behind the creation of the central depository and how both deterministic and probabilistic record linkage techniques are used to maintain and expand the environment. Approach We will start with a brief overview of the SDLE and then continue with a discussion of how both deterministic linkages and probabilistic linkages (using Statistic Canada’s generalized record linkage software, G-Link) have been combined to create and maintain a very large central depository, which can in turn be linked to virtually any social data source for the ultimate end goal of analysis. Results Although Canada has a population of about 36 million people, the central depository contains some 300 million records to represent them, due to multiple addresses, names, etc. Although this allows for a significant reduction in missing links, it raises the spectre of additional false positive matches and has added computational complexity which we have had to overcome. Conclusion The combination of deterministic and probabilistic record linkage strategies has been effective in creating the central depository for the SDLE. As more and more data are linked to the environment and we continue to refine our methodology, we can now move on to the ultimate goal of the SDLE, which is to analyze this vast wealth of linked data.https://ijpds.org/article/view/49 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Colin Babyak Abdelnasser Saidi |
spellingShingle |
Colin Babyak Abdelnasser Saidi Record Linkage Methodology for the Social Data Linkage Environment at Statistics Canada International Journal of Population Data Science |
author_facet |
Colin Babyak Abdelnasser Saidi |
author_sort |
Colin Babyak |
title |
Record Linkage Methodology for the Social Data Linkage Environment at Statistics Canada |
title_short |
Record Linkage Methodology for the Social Data Linkage Environment at Statistics Canada |
title_full |
Record Linkage Methodology for the Social Data Linkage Environment at Statistics Canada |
title_fullStr |
Record Linkage Methodology for the Social Data Linkage Environment at Statistics Canada |
title_full_unstemmed |
Record Linkage Methodology for the Social Data Linkage Environment at Statistics Canada |
title_sort |
record linkage methodology for the social data linkage environment at statistics canada |
publisher |
Swansea University |
series |
International Journal of Population Data Science |
issn |
2399-4908 |
publishDate |
2017-04-01 |
description |
ABSTRACT
Objectives
The objectives of this talk are to introduce Statistics Canada’s Social Data Linkage Environment (SDLE) and to explain the methodology behind the creation of the central depository and how both deterministic and probabilistic record linkage techniques are used to maintain and expand the environment.
Approach
We will start with a brief overview of the SDLE and then continue with a discussion of how both deterministic linkages and probabilistic linkages (using Statistic Canada’s generalized record linkage software, G-Link) have been combined to create and maintain a very large central depository, which can in turn be linked to virtually any social data source for the ultimate end goal of analysis.
Results
Although Canada has a population of about 36 million people, the central depository contains some 300 million records to represent them, due to multiple addresses, names, etc. Although this allows for a significant reduction in missing links, it raises the spectre of additional false positive matches and has added computational complexity which we have had to overcome.
Conclusion
The combination of deterministic and probabilistic record linkage strategies has been effective in creating the central depository for the SDLE. As more and more data are linked to the environment and we continue to refine our methodology, we can now move on to the ultimate goal of the SDLE, which is to analyze this vast wealth of linked data. |
url |
https://ijpds.org/article/view/49 |
work_keys_str_mv |
AT colinbabyak recordlinkagemethodologyforthesocialdatalinkageenvironmentatstatisticscanada AT abdelnassersaidi recordlinkagemethodologyforthesocialdatalinkageenvironmentatstatisticscanada |
_version_ |
1725814493727948800 |