A High Accurate Multiple Classifier System for Entity Resolution Using Resampling and Ensemble Selection
Classifiers are often used in entity resolution to classify record pairs into matches, nonmatches, and possible matches, the performance of classifiers is directly related to the performance of entity resolution. In this paper, we develop a multiple classifier system using resampling and ensemble se...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Hindawi Limited
2015-01-01
|
Series: | Mathematical Problems in Engineering |
Online Access: | http://dx.doi.org/10.1155/2015/630176 |
id |
doaj-24896d2b64c847e8ab31b68c27b6935e |
---|---|
record_format |
Article |
spelling |
doaj-24896d2b64c847e8ab31b68c27b6935e2020-11-24T23:56:02ZengHindawi LimitedMathematical Problems in Engineering1024-123X1563-51472015-01-01201510.1155/2015/630176630176A High Accurate Multiple Classifier System for Entity Resolution Using Resampling and Ensemble SelectionZhou Xing0Diao Xingchun1Cao Jianjun2PLA University of Science and Technology, Nanjing 210007, ChinaPLA University of Science and Technology, Nanjing 210007, ChinaPLA University of Science and Technology, Nanjing 210007, ChinaClassifiers are often used in entity resolution to classify record pairs into matches, nonmatches, and possible matches, the performance of classifiers is directly related to the performance of entity resolution. In this paper, we develop a multiple classifier system using resampling and ensemble selection. We make full use of the characteristics of entity resolution to distinguish ambiguous instances before classification, so that the algorithm can focus on the ambiguous instances in parallel. Instead of developing an empirical optimal resampling ratio, we vary the ratio in a range to generate multiple resampled data. Further, we use the resampled data to train multiple classifiers and then use ensemble selection to select the best classifiers subset, which is also the best resampling ratio combination. Empirical study shows our method has a relatively high accuracy compared to other state-of-the-art multiple classifiers systems.http://dx.doi.org/10.1155/2015/630176 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Zhou Xing Diao Xingchun Cao Jianjun |
spellingShingle |
Zhou Xing Diao Xingchun Cao Jianjun A High Accurate Multiple Classifier System for Entity Resolution Using Resampling and Ensemble Selection Mathematical Problems in Engineering |
author_facet |
Zhou Xing Diao Xingchun Cao Jianjun |
author_sort |
Zhou Xing |
title |
A High Accurate Multiple Classifier System for Entity Resolution Using Resampling and Ensemble Selection |
title_short |
A High Accurate Multiple Classifier System for Entity Resolution Using Resampling and Ensemble Selection |
title_full |
A High Accurate Multiple Classifier System for Entity Resolution Using Resampling and Ensemble Selection |
title_fullStr |
A High Accurate Multiple Classifier System for Entity Resolution Using Resampling and Ensemble Selection |
title_full_unstemmed |
A High Accurate Multiple Classifier System for Entity Resolution Using Resampling and Ensemble Selection |
title_sort |
high accurate multiple classifier system for entity resolution using resampling and ensemble selection |
publisher |
Hindawi Limited |
series |
Mathematical Problems in Engineering |
issn |
1024-123X 1563-5147 |
publishDate |
2015-01-01 |
description |
Classifiers are often used in entity resolution to classify record pairs into matches, nonmatches, and possible matches, the performance of classifiers is directly related to the performance of entity resolution. In this paper, we develop a multiple classifier system using resampling and ensemble selection. We make full use of the characteristics of entity resolution to distinguish ambiguous instances before classification, so that the algorithm can focus on the ambiguous instances in parallel. Instead of developing an empirical optimal resampling ratio, we vary the ratio in a range to generate multiple resampled data. Further, we use the resampled data to train multiple classifiers and then use ensemble selection to select the best classifiers subset, which is also the best resampling ratio combination. Empirical study shows our method has a relatively high accuracy compared to other state-of-the-art multiple classifiers systems. |
url |
http://dx.doi.org/10.1155/2015/630176 |
work_keys_str_mv |
AT zhouxing ahighaccuratemultipleclassifiersystemforentityresolutionusingresamplingandensembleselection AT diaoxingchun ahighaccuratemultipleclassifiersystemforentityresolutionusingresamplingandensembleselection AT caojianjun ahighaccuratemultipleclassifiersystemforentityresolutionusingresamplingandensembleselection AT zhouxing highaccuratemultipleclassifiersystemforentityresolutionusingresamplingandensembleselection AT diaoxingchun highaccuratemultipleclassifiersystemforentityresolutionusingresamplingandensembleselection AT caojianjun highaccuratemultipleclassifiersystemforentityresolutionusingresamplingandensembleselection |
_version_ |
1725459984491216896 |