A High Accurate Multiple Classifier System for Entity Resolution Using Resampling and Ensemble Selection

Classifiers are often used in entity resolution to classify record pairs into matches, nonmatches, and possible matches, the performance of classifiers is directly related to the performance of entity resolution. In this paper, we develop a multiple classifier system using resampling and ensemble se...

Full description

Bibliographic Details
Main Authors: Zhou Xing, Diao Xingchun, Cao Jianjun
Format: Article
Language:English
Published: Hindawi Limited 2015-01-01
Series:Mathematical Problems in Engineering
Online Access:http://dx.doi.org/10.1155/2015/630176
id doaj-24896d2b64c847e8ab31b68c27b6935e
record_format Article
spelling doaj-24896d2b64c847e8ab31b68c27b6935e2020-11-24T23:56:02ZengHindawi LimitedMathematical Problems in Engineering1024-123X1563-51472015-01-01201510.1155/2015/630176630176A High Accurate Multiple Classifier System for Entity Resolution Using Resampling and Ensemble SelectionZhou Xing0Diao Xingchun1Cao Jianjun2PLA University of Science and Technology, Nanjing 210007, ChinaPLA University of Science and Technology, Nanjing 210007, ChinaPLA University of Science and Technology, Nanjing 210007, ChinaClassifiers are often used in entity resolution to classify record pairs into matches, nonmatches, and possible matches, the performance of classifiers is directly related to the performance of entity resolution. In this paper, we develop a multiple classifier system using resampling and ensemble selection. We make full use of the characteristics of entity resolution to distinguish ambiguous instances before classification, so that the algorithm can focus on the ambiguous instances in parallel. Instead of developing an empirical optimal resampling ratio, we vary the ratio in a range to generate multiple resampled data. Further, we use the resampled data to train multiple classifiers and then use ensemble selection to select the best classifiers subset, which is also the best resampling ratio combination. Empirical study shows our method has a relatively high accuracy compared to other state-of-the-art multiple classifiers systems.http://dx.doi.org/10.1155/2015/630176
collection DOAJ
language English
format Article
sources DOAJ
author Zhou Xing
Diao Xingchun
Cao Jianjun
spellingShingle Zhou Xing
Diao Xingchun
Cao Jianjun
A High Accurate Multiple Classifier System for Entity Resolution Using Resampling and Ensemble Selection
Mathematical Problems in Engineering
author_facet Zhou Xing
Diao Xingchun
Cao Jianjun
author_sort Zhou Xing
title A High Accurate Multiple Classifier System for Entity Resolution Using Resampling and Ensemble Selection
title_short A High Accurate Multiple Classifier System for Entity Resolution Using Resampling and Ensemble Selection
title_full A High Accurate Multiple Classifier System for Entity Resolution Using Resampling and Ensemble Selection
title_fullStr A High Accurate Multiple Classifier System for Entity Resolution Using Resampling and Ensemble Selection
title_full_unstemmed A High Accurate Multiple Classifier System for Entity Resolution Using Resampling and Ensemble Selection
title_sort high accurate multiple classifier system for entity resolution using resampling and ensemble selection
publisher Hindawi Limited
series Mathematical Problems in Engineering
issn 1024-123X
1563-5147
publishDate 2015-01-01
description Classifiers are often used in entity resolution to classify record pairs into matches, nonmatches, and possible matches, the performance of classifiers is directly related to the performance of entity resolution. In this paper, we develop a multiple classifier system using resampling and ensemble selection. We make full use of the characteristics of entity resolution to distinguish ambiguous instances before classification, so that the algorithm can focus on the ambiguous instances in parallel. Instead of developing an empirical optimal resampling ratio, we vary the ratio in a range to generate multiple resampled data. Further, we use the resampled data to train multiple classifiers and then use ensemble selection to select the best classifiers subset, which is also the best resampling ratio combination. Empirical study shows our method has a relatively high accuracy compared to other state-of-the-art multiple classifiers systems.
url http://dx.doi.org/10.1155/2015/630176
work_keys_str_mv AT zhouxing ahighaccuratemultipleclassifiersystemforentityresolutionusingresamplingandensembleselection
AT diaoxingchun ahighaccuratemultipleclassifiersystemforentityresolutionusingresamplingandensembleselection
AT caojianjun ahighaccuratemultipleclassifiersystemforentityresolutionusingresamplingandensembleselection
AT zhouxing highaccuratemultipleclassifiersystemforentityresolutionusingresamplingandensembleselection
AT diaoxingchun highaccuratemultipleclassifiersystemforentityresolutionusingresamplingandensembleselection
AT caojianjun highaccuratemultipleclassifiersystemforentityresolutionusingresamplingandensembleselection
_version_ 1725459984491216896