Finding a record in a database

Consider the following problem: given a database of records indexed by names (e.g., of companies or restaurants) and a new name, determine whether the new name is in the database, and if so, which record it refers to. This problem is called record linkage. Record linkage is a challenging problem be...

Full description

Bibliographic Details
Main Author: Fatemi, Bahare
Language:English
Published: University of British Columbia 2017
Online Access:http://hdl.handle.net/2429/62575
id ndltd-UBC-oai-circle.library.ubc.ca-2429-62575
record_format oai_dc
spelling ndltd-UBC-oai-circle.library.ubc.ca-2429-625752018-01-05T17:29:58Z Finding a record in a database Fatemi, Bahare Consider the following problem: given a database of records indexed by names (e.g., of companies or restaurants) and a new name, determine whether the new name is in the database, and if so, which record it refers to. This problem is called record linkage. Record linkage is a challenging problem because people do not consistently use the official title of a company, but use abbreviations, synonyms, different orders of terms, and the title can contain typos. We provide a probabilistic model using relational logistic regression to find the probability of each record in the database being the desired record for a given query, and find the best record(s). Our model addresses many of challenges of the record linkage problem and provides good results when exact term matching search algorithms fail. We evaluate our model on a large real-world data set. Obtained results show that the model is a promising probabilistic record linkage model. Science, Faculty of Computer Science, Department of Graduate 2017-08-10T22:03:24Z 2017-08-10T22:03:24Z 2017 2017-09 Text Thesis/Dissertation http://hdl.handle.net/2429/62575 eng Attribution-NonCommercial-NoDerivatives 4.0 International http://creativecommons.org/licenses/by-nc-nd/4.0/ University of British Columbia
collection NDLTD
language English
sources NDLTD
description Consider the following problem: given a database of records indexed by names (e.g., of companies or restaurants) and a new name, determine whether the new name is in the database, and if so, which record it refers to. This problem is called record linkage. Record linkage is a challenging problem because people do not consistently use the official title of a company, but use abbreviations, synonyms, different orders of terms, and the title can contain typos. We provide a probabilistic model using relational logistic regression to find the probability of each record in the database being the desired record for a given query, and find the best record(s). Our model addresses many of challenges of the record linkage problem and provides good results when exact term matching search algorithms fail. We evaluate our model on a large real-world data set. Obtained results show that the model is a promising probabilistic record linkage model. === Science, Faculty of === Computer Science, Department of === Graduate
author Fatemi, Bahare
spellingShingle Fatemi, Bahare
Finding a record in a database
author_facet Fatemi, Bahare
author_sort Fatemi, Bahare
title Finding a record in a database
title_short Finding a record in a database
title_full Finding a record in a database
title_fullStr Finding a record in a database
title_full_unstemmed Finding a record in a database
title_sort finding a record in a database
publisher University of British Columbia
publishDate 2017
url http://hdl.handle.net/2429/62575
work_keys_str_mv AT fatemibahare findingarecordinadatabase
_version_ 1718585861764808704