Ambiguity Resolution of Author Names for Bibliographic Data

碩士 === 國立臺灣大學 === 圖書資訊學研究所 === 99 === In order to solve name ambiguity when retrieving academic information, researches on author identification are indispensable. With comparison to previous works, this study attempts to address this problem using information contained in bibliographic data only. F...

Full description

Bibliographic Details
Main Authors:	Chi-Nan Hsieh, 謝其男
Other Authors:	陳光華
Format:	Others
Language:	en_US
Published:	2011
Online Access:	http://ndltd.ncl.edu.tw/handle/euunha

id	ndltd-TW-099NTU05448024
record_format	oai_dc
spelling	ndltd-TW-099NTU054480242019-05-15T20:42:51Z http://ndltd.ncl.edu.tw/handle/euunha Ambiguity Resolution of Author Names for Bibliographic Data 書目資料中著者姓名歧異性之解析 Chi-Nan Hsieh 謝其男碩士國立臺灣大學圖書資訊學研究所 99 In order to solve name ambiguity when retrieving academic information, researches on author identification are indispensable. With comparison to previous works, this study attempts to address this problem using information contained in bibliographic data only. Five features, co-author (C), article title (T), journal title (J), year (Y), and number of pages (P), are extracted from bibliographic data and will be used to disambiguate author names in this work. Note that feature Y and feature P are not ever used before. Both supervised learning methods (Naive Bayes and Support Vector Machine) and unsupervised learning method (K-means) are employed to explore 28 different feature combinations. The findings show that the performance of feature journal title (J) and co-author (C) is very effective. Feature J plays an important role in three different approaches, and feature C is mainly outstanding in SVM. In addition, feature year (Y) and feature number of pages (P) obviously enhance accuracy rate while they accompanied with various feature combination(s), and the average improvement rate of inclusion with feature Y is more significant than feature P. However, it is significant that the effect is more positive in K-means clustering (+4.98% in average) than that in Naive Bayes Model (+0.90% in average) and Support Vector Machine (+0.15% in average). It is also shown that the performance of feature combination CTJ used traditionally is not superior to JYP, and the performance of feature combinations CJY, JY and J are also very effective in three methods. Finally, it is found that the accuracy of disambiguation on larger datasets is 10% inferior to the smaller ones, which indicated the limitation and deficiency of the performance achieved by bibliographic data in this “numerous and jumbled” real world. Consequently, it is a promising trend in the future to build an intellectual mechanism to map other information onto bibliographic information accurately in order to get sufficient information for author disambiguation. 陳光華 2011 學位論文 ; thesis 47 en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
description	碩士 === 國立臺灣大學 === 圖書資訊學研究所 === 99 === In order to solve name ambiguity when retrieving academic information, researches on author identification are indispensable. With comparison to previous works, this study attempts to address this problem using information contained in bibliographic data only. Five features, co-author (C), article title (T), journal title (J), year (Y), and number of pages (P), are extracted from bibliographic data and will be used to disambiguate author names in this work. Note that feature Y and feature P are not ever used before. Both supervised learning methods (Naive Bayes and Support Vector Machine) and unsupervised learning method (K-means) are employed to explore 28 different feature combinations. The findings show that the performance of feature journal title (J) and co-author (C) is very effective. Feature J plays an important role in three different approaches, and feature C is mainly outstanding in SVM. In addition, feature year (Y) and feature number of pages (P) obviously enhance accuracy rate while they accompanied with various feature combination(s), and the average improvement rate of inclusion with feature Y is more significant than feature P. However, it is significant that the effect is more positive in K-means clustering (+4.98% in average) than that in Naive Bayes Model (+0.90% in average) and Support Vector Machine (+0.15% in average). It is also shown that the performance of feature combination CTJ used traditionally is not superior to JYP, and the performance of feature combinations CJY, JY and J are also very effective in three methods. Finally, it is found that the accuracy of disambiguation on larger datasets is 10% inferior to the smaller ones, which indicated the limitation and deficiency of the performance achieved by bibliographic data in this “numerous and jumbled” real world. Consequently, it is a promising trend in the future to build an intellectual mechanism to map other information onto bibliographic information accurately in order to get sufficient information for author disambiguation.
author2	陳光華
author_facet	陳光華 Chi-Nan Hsieh 謝其男
author	Chi-Nan Hsieh 謝其男
spellingShingle	Chi-Nan Hsieh 謝其男 Ambiguity Resolution of Author Names for Bibliographic Data
author_sort	Chi-Nan Hsieh
title	Ambiguity Resolution of Author Names for Bibliographic Data
title_short	Ambiguity Resolution of Author Names for Bibliographic Data
title_full	Ambiguity Resolution of Author Names for Bibliographic Data
title_fullStr	Ambiguity Resolution of Author Names for Bibliographic Data
title_full_unstemmed	Ambiguity Resolution of Author Names for Bibliographic Data
title_sort	ambiguity resolution of author names for bibliographic data
publishDate	2011
url	http://ndltd.ncl.edu.tw/handle/euunha
work_keys_str_mv	AT chinanhsieh ambiguityresolutionofauthornamesforbibliographicdata AT xièqínán ambiguityresolutionofauthornamesforbibliographicdata AT chinanhsieh shūmùzīliàozhōngzhezhěxìngmíngqíyìxìngzhījiěxī AT xièqínán shūmùzīliàozhōngzhezhěxìngmíngqíyìxìngzhījiěxī
_version_	1719103403713363968

Ambiguity Resolution of Author Names for Bibliographic Data

Similar Items