Finding a record in a database
Consider the following problem: given a database of records indexed by names (e.g., of companies or restaurants) and a new name, determine whether the new name is in the database, and if so, which record it refers to. This problem is called record linkage. Record linkage is a challenging problem be...
Main Author: | |
---|---|
Language: | English |
Published: |
University of British Columbia
2017
|
Online Access: | http://hdl.handle.net/2429/62575 |
id |
ndltd-UBC-oai-circle.library.ubc.ca-2429-62575 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-UBC-oai-circle.library.ubc.ca-2429-625752018-01-05T17:29:58Z Finding a record in a database Fatemi, Bahare Consider the following problem: given a database of records indexed by names (e.g., of companies or restaurants) and a new name, determine whether the new name is in the database, and if so, which record it refers to. This problem is called record linkage. Record linkage is a challenging problem because people do not consistently use the official title of a company, but use abbreviations, synonyms, different orders of terms, and the title can contain typos. We provide a probabilistic model using relational logistic regression to find the probability of each record in the database being the desired record for a given query, and find the best record(s). Our model addresses many of challenges of the record linkage problem and provides good results when exact term matching search algorithms fail. We evaluate our model on a large real-world data set. Obtained results show that the model is a promising probabilistic record linkage model. Science, Faculty of Computer Science, Department of Graduate 2017-08-10T22:03:24Z 2017-08-10T22:03:24Z 2017 2017-09 Text Thesis/Dissertation http://hdl.handle.net/2429/62575 eng Attribution-NonCommercial-NoDerivatives 4.0 International http://creativecommons.org/licenses/by-nc-nd/4.0/ University of British Columbia |
collection |
NDLTD |
language |
English |
sources |
NDLTD |
description |
Consider the following problem: given a database of records indexed by names (e.g., of companies or restaurants) and a new name, determine whether the new name is in the database, and if so, which record it refers to. This problem is called record linkage. Record linkage is a challenging problem because people do not consistently use the official title of a company, but use abbreviations, synonyms, different orders of terms, and the title can contain typos.
We provide a probabilistic model using relational logistic regression to find the probability of each record in the database being the desired record for a given query, and find the best record(s).
Our model addresses many of challenges of the record linkage problem and provides good results when exact term matching search algorithms fail. We evaluate our model on a large real-world data set. Obtained results show that the model is a promising probabilistic record linkage model. === Science, Faculty of === Computer Science, Department of === Graduate |
author |
Fatemi, Bahare |
spellingShingle |
Fatemi, Bahare Finding a record in a database |
author_facet |
Fatemi, Bahare |
author_sort |
Fatemi, Bahare |
title |
Finding a record in a database |
title_short |
Finding a record in a database |
title_full |
Finding a record in a database |
title_fullStr |
Finding a record in a database |
title_full_unstemmed |
Finding a record in a database |
title_sort |
finding a record in a database |
publisher |
University of British Columbia |
publishDate |
2017 |
url |
http://hdl.handle.net/2429/62575 |
work_keys_str_mv |
AT fatemibahare findingarecordinadatabase |
_version_ |
1718585861764808704 |