Finding a record in a database

Consider the following problem: given a database of records indexed by names (e.g., of companies or restaurants) and a new name, determine whether the new name is in the database, and if so, which record it refers to. This problem is called record linkage. Record linkage is a challenging problem be...

Full description

Bibliographic Details
Main Author: Fatemi, Bahare
Language:English
Published: University of British Columbia 2017
Online Access:http://hdl.handle.net/2429/62575
Description
Summary:Consider the following problem: given a database of records indexed by names (e.g., of companies or restaurants) and a new name, determine whether the new name is in the database, and if so, which record it refers to. This problem is called record linkage. Record linkage is a challenging problem because people do not consistently use the official title of a company, but use abbreviations, synonyms, different orders of terms, and the title can contain typos. We provide a probabilistic model using relational logistic regression to find the probability of each record in the database being the desired record for a given query, and find the best record(s). Our model addresses many of challenges of the record linkage problem and provides good results when exact term matching search algorithms fail. We evaluate our model on a large real-world data set. Obtained results show that the model is a promising probabilistic record linkage model. === Science, Faculty of === Computer Science, Department of === Graduate