An Enhanced Algorithm for Spatio-Textual Similarity Join by Considering Multiple Keyword Lengths

碩士 === 國立臺灣科技大學 === 資訊工程系 === 106 === With the popularity of mobile devices in conjunction with location based service (LBS), users generate a wealth of spatio-textual data, such as check-in for both Foursquare and Facebook, Facebook tag, point of interest (POI) generation, and more. Through these d...

Full description

Bibliographic Details
Main Authors: Chen-Ju Liao, 廖晨如
Other Authors: Ge-Ming Chiu
Format: Others
Language:zh-TW
Published: 2018
Online Access:http://ndltd.ncl.edu.tw/handle/hnqxuu
Description
Summary:碩士 === 國立臺灣科技大學 === 資訊工程系 === 106 === With the popularity of mobile devices in conjunction with location based service (LBS), users generate a wealth of spatio-textual data, such as check-in for both Foursquare and Facebook, Facebook tag, point of interest (POI) generation, and more. Through these data, the issue of Spatio-textual Similarity has become important. For example, spatial-textual data generated by various sources can be effective to integrated, such as small spatial errors in space and different expressions in texts, and it can also be used to find high-similarity spatio-textual data published by different users on social networks to dating recommendations. However, how to efficiently calculate the similarity of these spatial-textual data and find out k pairs with highest similarity are the main topics to be discussed in this thesis. In this paper, in order to improve the algorithmic efficiency of similarity between spatial and textual data and to improve the performance of the existing methods, we study the spatio-textual signature framework. The concept is to generate a spatio-textual signature set for all spatial-textual data. If the similarity scores for each other are likely to be in the highest k result, the two spatio-textual signature sets must overlap. With this concept, we can prune pairs that do not overlap. In our research, in order to improve the filtering efficiency of the spatio-textual signature, each spatial-textual data will create different spatio-textual signature for others depending on their length of textual description. As a result, there is a large amount of reduction in the similarity calculation of the spatial-textual data. Besides, we employ synthetic database and Twitter database in the experiment to show that our method outperforms the other method significantly.