Ambiguity Resolution of Author Names for Bibliographic Data
碩士 === 國立臺灣大學 === 圖書資訊學研究所 === 99 === In order to solve name ambiguity when retrieving academic information, researches on author identification are indispensable. With comparison to previous works, this study attempts to address this problem using information contained in bibliographic data only. F...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2011
|
Online Access: | http://ndltd.ncl.edu.tw/handle/euunha |
id |
ndltd-TW-099NTU05448024 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-099NTU054480242019-05-15T20:42:51Z http://ndltd.ncl.edu.tw/handle/euunha Ambiguity Resolution of Author Names for Bibliographic Data 書目資料中著者姓名歧異性之解析 Chi-Nan Hsieh 謝其男 碩士 國立臺灣大學 圖書資訊學研究所 99 In order to solve name ambiguity when retrieving academic information, researches on author identification are indispensable. With comparison to previous works, this study attempts to address this problem using information contained in bibliographic data only. Five features, co-author (C), article title (T), journal title (J), year (Y), and number of pages (P), are extracted from bibliographic data and will be used to disambiguate author names in this work. Note that feature Y and feature P are not ever used before. Both supervised learning methods (Naive Bayes and Support Vector Machine) and unsupervised learning method (K-means) are employed to explore 28 different feature combinations. The findings show that the performance of feature journal title (J) and co-author (C) is very effective. Feature J plays an important role in three different approaches, and feature C is mainly outstanding in SVM. In addition, feature year (Y) and feature number of pages (P) obviously enhance accuracy rate while they accompanied with various feature combination(s), and the average improvement rate of inclusion with feature Y is more significant than feature P. However, it is significant that the effect is more positive in K-means clustering (+4.98% in average) than that in Naive Bayes Model (+0.90% in average) and Support Vector Machine (+0.15% in average). It is also shown that the performance of feature combination CTJ used traditionally is not superior to JYP, and the performance of feature combinations CJY, JY and J are also very effective in three methods. Finally, it is found that the accuracy of disambiguation on larger datasets is 10% inferior to the smaller ones, which indicated the limitation and deficiency of the performance achieved by bibliographic data in this “numerous and jumbled” real world. Consequently, it is a promising trend in the future to build an intellectual mechanism to map other information onto bibliographic information accurately in order to get sufficient information for author disambiguation. 陳光華 2011 學位論文 ; thesis 47 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺灣大學 === 圖書資訊學研究所 === 99 === In order to solve name ambiguity when retrieving academic information, researches on author identification are indispensable. With comparison to previous works, this study attempts to address this problem using information contained in bibliographic data only. Five features, co-author (C), article title (T), journal title (J), year (Y), and number of pages (P), are extracted from bibliographic data and will be used to disambiguate author names in this work. Note that feature Y and feature P are not ever used before. Both supervised learning methods (Naive Bayes and Support Vector Machine) and unsupervised learning method (K-means) are employed to explore 28 different feature combinations.
The findings show that the performance of feature journal title (J) and co-author (C) is very effective. Feature J plays an important role in three different approaches, and feature C is mainly outstanding in SVM. In addition, feature year (Y) and feature number of pages (P) obviously enhance accuracy rate while they accompanied with various feature combination(s), and the average improvement rate of inclusion with feature Y is more significant than feature P. However, it is significant that the effect is more positive in K-means clustering (+4.98% in average) than that in Naive Bayes Model (+0.90% in average) and Support Vector Machine (+0.15% in average).
It is also shown that the performance of feature combination CTJ used traditionally is not superior to JYP, and the performance of feature combinations CJY, JY and J are also very effective in three methods. Finally, it is found that the accuracy of disambiguation on larger datasets is 10% inferior to the smaller ones, which indicated the limitation and deficiency of the performance achieved by bibliographic data in this “numerous and jumbled” real world. Consequently, it is a promising trend in the future to build an intellectual mechanism to map other information onto bibliographic information accurately in order to get sufficient information for author disambiguation.
|
author2 |
陳光華 |
author_facet |
陳光華 Chi-Nan Hsieh 謝其男 |
author |
Chi-Nan Hsieh 謝其男 |
spellingShingle |
Chi-Nan Hsieh 謝其男 Ambiguity Resolution of Author Names for Bibliographic Data |
author_sort |
Chi-Nan Hsieh |
title |
Ambiguity Resolution of Author Names for Bibliographic Data |
title_short |
Ambiguity Resolution of Author Names for Bibliographic Data |
title_full |
Ambiguity Resolution of Author Names for Bibliographic Data |
title_fullStr |
Ambiguity Resolution of Author Names for Bibliographic Data |
title_full_unstemmed |
Ambiguity Resolution of Author Names for Bibliographic Data |
title_sort |
ambiguity resolution of author names for bibliographic data |
publishDate |
2011 |
url |
http://ndltd.ncl.edu.tw/handle/euunha |
work_keys_str_mv |
AT chinanhsieh ambiguityresolutionofauthornamesforbibliographicdata AT xièqínán ambiguityresolutionofauthornamesforbibliographicdata AT chinanhsieh shūmùzīliàozhōngzhezhěxìngmíngqíyìxìngzhījiěxī AT xièqínán shūmùzīliàozhōngzhezhěxìngmíngqíyìxìngzhījiěxī |
_version_ |
1719103403713363968 |