ADGN: An Algorithm for Record Linkage Using Address, Date of Birth, Gender, and Name

This article presents an algorithm for record linkage that uses multiple indicators derived from combinations of fields commonly found in databases. Specifically, the quadruplet of Address (A), Date of Birth (D), Gender (G), and Name (N) and any triplet of A-D-G-N (i.e., ADG, ADN, AGN, and DGN) also...

Full description

Bibliographic Details
Main Authors: Stephen Ansolabehere, Eitan D. Hersh
Format: Article
Language:English
Published: Taylor & Francis Group 2017-01-01
Series:Statistics and Public Policy
Subjects:
Online Access:http://dx.doi.org/10.1080/2330443X.2017.1389620
Description
Summary:This article presents an algorithm for record linkage that uses multiple indicators derived from combinations of fields commonly found in databases. Specifically, the quadruplet of Address (A), Date of Birth (D), Gender (G), and Name (N) and any triplet of A-D-G-N (i.e., ADG, ADN, AGN, and DGN) also link records with an extremely high likelihood. Matching on multiple identifiers avoids problems of missing data, inconsistent fields, and typographical errors. We show, using a very large database from the State of Texas, that exact matches using combinations A, D, G, and N produce a rate of matches comparable to 9-Digit Social Security Number. Further examination of the linkage rates show that reporting of the data at a higher level of aggregation, such as Birth Year instead of Date of Birth and omission of names, makes correct matches between databases highly unlikely, protecting an individual’s records.
ISSN:2330-443X