Summary: | Artificial Intelligence Lab, Department of MIS, University of Arizona === There has been renewed research interest in using the statistical approach to extraction
of key phrases from Chinese documents because existing approaches do not allow online
frequency updates after phrases have been extracted. This consequently results in
inaccurate, partial extraction. In this paper, we present an updateable PAT-tree
approach. In our experiment, we compared our approach with that of Lee-Feng Chien
with that showed an improvement in recall from 0.19 to 0.43 and in precision from 0.52
to 0.70. This paper also reviews the requirements for a data structure that facilitates
implementation of any statistical approaches to key-phrase extraction, including PATtree,
PAT-array and suffix array with semi-infinite strings.
|