Summary: | 碩士 === 國立臺灣科技大學 === 資訊工程系 === 92 === Semantic similarity measurement is an important research domain in information retrieval and information integration. Since the semantic similarity’s object documents, which we care about, have different properties, the corresponding semantic similarity measurement should make changes in the information integration concept. In the financial domain, the news titles show most information of total news articles. At the same time, the messages which are not the facts would not be shown in the news titles frequently. Therefore the semantic similarity measurement of financial news titles can be used to replace the semantic similarity measurement of whole financial news articles in order to improve the time consuming and reduce the number of keywords. When the documents are composed by a few amount of keywords, i.e. the financial news titles, the difficulty to measure semantic similarity of these documents is that the different documents makes the resemble vector by their keyword sets. Based on the reason, we provide a frame-like structure, named “Event Frame”, to archive the Chinese financial news titles in order to include the classification information of the financial news titles and compute the relation between two Chinese financial news titles based on the Event Frame structure in this thesis.
This semantic similarity measurement of the Chinese financial news titles is based on constructing the Event Frame structure as the template of a Chinese financial news title. A semantic similarity function is used to integrate both the relation of Event Frames of the financial news titles and the relation between the keywords between the keywords of these titles. It concerns the relation between the basic meanings of two news titles and reduces the comparing time. The result of this approach shows that the Event Frame extracting has high precision as man-made and the provided semantic similarity measurement emphasizes the relation between the basic meanings of two news titles rather than the relation of keywords. Besides, the proposed similarity measurement retrieves the information of keywords since sometimes humans think that two news titles are similar only if the intersection of keyword sets in two news titles is large. Therefore, we can differentiate the Chinese financial news titles which mention the same event from all the Chinese financial news titles by the semantic similarity measurement based on Event Frame extracting.
|