Ambiguity Resolution of Author Names for Bibliographic Data

碩士 === 國立臺灣大學 === 圖書資訊學研究所 === 99 === In order to solve name ambiguity when retrieving academic information, researches on author identification are indispensable. With comparison to previous works, this study attempts to address this problem using information contained in bibliographic data only. F...

Full description

Bibliographic Details
Main Authors: Chi-Nan Hsieh, 謝其男
Other Authors: 陳光華
Format: Others
Language:en_US
Published: 2011
Online Access:http://ndltd.ncl.edu.tw/handle/euunha
id ndltd-TW-099NTU05448024
record_format oai_dc
spelling ndltd-TW-099NTU054480242019-05-15T20:42:51Z http://ndltd.ncl.edu.tw/handle/euunha Ambiguity Resolution of Author Names for Bibliographic Data 書目資料中著者姓名歧異性之解析 Chi-Nan Hsieh 謝其男 碩士 國立臺灣大學 圖書資訊學研究所 99 In order to solve name ambiguity when retrieving academic information, researches on author identification are indispensable. With comparison to previous works, this study attempts to address this problem using information contained in bibliographic data only. Five features, co-author (C), article title (T), journal title (J), year (Y), and number of pages (P), are extracted from bibliographic data and will be used to disambiguate author names in this work. Note that feature Y and feature P are not ever used before. Both supervised learning methods (Naive Bayes and Support Vector Machine) and unsupervised learning method (K-means) are employed to explore 28 different feature combinations. The findings show that the performance of feature journal title (J) and co-author (C) is very effective. Feature J plays an important role in three different approaches, and feature C is mainly outstanding in SVM. In addition, feature year (Y) and feature number of pages (P) obviously enhance accuracy rate while they accompanied with various feature combination(s), and the average improvement rate of inclusion with feature Y is more significant than feature P. However, it is significant that the effect is more positive in K-means clustering (+4.98% in average) than that in Naive Bayes Model (+0.90% in average) and Support Vector Machine (+0.15% in average). It is also shown that the performance of feature combination CTJ used traditionally is not superior to JYP, and the performance of feature combinations CJY, JY and J are also very effective in three methods. Finally, it is found that the accuracy of disambiguation on larger datasets is 10% inferior to the smaller ones, which indicated the limitation and deficiency of the performance achieved by bibliographic data in this “numerous and jumbled” real world. Consequently, it is a promising trend in the future to build an intellectual mechanism to map other information onto bibliographic information accurately in order to get sufficient information for author disambiguation. 陳光華 2011 學位論文 ; thesis 47 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立臺灣大學 === 圖書資訊學研究所 === 99 === In order to solve name ambiguity when retrieving academic information, researches on author identification are indispensable. With comparison to previous works, this study attempts to address this problem using information contained in bibliographic data only. Five features, co-author (C), article title (T), journal title (J), year (Y), and number of pages (P), are extracted from bibliographic data and will be used to disambiguate author names in this work. Note that feature Y and feature P are not ever used before. Both supervised learning methods (Naive Bayes and Support Vector Machine) and unsupervised learning method (K-means) are employed to explore 28 different feature combinations. The findings show that the performance of feature journal title (J) and co-author (C) is very effective. Feature J plays an important role in three different approaches, and feature C is mainly outstanding in SVM. In addition, feature year (Y) and feature number of pages (P) obviously enhance accuracy rate while they accompanied with various feature combination(s), and the average improvement rate of inclusion with feature Y is more significant than feature P. However, it is significant that the effect is more positive in K-means clustering (+4.98% in average) than that in Naive Bayes Model (+0.90% in average) and Support Vector Machine (+0.15% in average). It is also shown that the performance of feature combination CTJ used traditionally is not superior to JYP, and the performance of feature combinations CJY, JY and J are also very effective in three methods. Finally, it is found that the accuracy of disambiguation on larger datasets is 10% inferior to the smaller ones, which indicated the limitation and deficiency of the performance achieved by bibliographic data in this “numerous and jumbled” real world. Consequently, it is a promising trend in the future to build an intellectual mechanism to map other information onto bibliographic information accurately in order to get sufficient information for author disambiguation.
author2 陳光華
author_facet 陳光華
Chi-Nan Hsieh
謝其男
author Chi-Nan Hsieh
謝其男
spellingShingle Chi-Nan Hsieh
謝其男
Ambiguity Resolution of Author Names for Bibliographic Data
author_sort Chi-Nan Hsieh
title Ambiguity Resolution of Author Names for Bibliographic Data
title_short Ambiguity Resolution of Author Names for Bibliographic Data
title_full Ambiguity Resolution of Author Names for Bibliographic Data
title_fullStr Ambiguity Resolution of Author Names for Bibliographic Data
title_full_unstemmed Ambiguity Resolution of Author Names for Bibliographic Data
title_sort ambiguity resolution of author names for bibliographic data
publishDate 2011
url http://ndltd.ncl.edu.tw/handle/euunha
work_keys_str_mv AT chinanhsieh ambiguityresolutionofauthornamesforbibliographicdata
AT xièqínán ambiguityresolutionofauthornamesforbibliographicdata
AT chinanhsieh shūmùzīliàozhōngzhezhěxìngmíngqíyìxìngzhījiěxī
AT xièqínán shūmùzīliàozhōngzhezhěxìngmíngqíyìxìngzhījiěxī
_version_ 1719103403713363968