An Enhanced Algorithm for Spatio-Textual Similarity Join by Considering Multiple Keyword Lengths

碩士 === 國立臺灣科技大學 === 資訊工程系 === 106 === With the popularity of mobile devices in conjunction with location based service (LBS), users generate a wealth of spatio-textual data, such as check-in for both Foursquare and Facebook, Facebook tag, point of interest (POI) generation, and more. Through these d...

Full description

Bibliographic Details
Main Authors:	Chen-Ju Liao, 廖晨如
Other Authors:	Ge-Ming Chiu
Format:	Others
Language:	zh-TW
Published:	2018
Online Access:	http://ndltd.ncl.edu.tw/handle/hnqxuu

id	ndltd-TW-106NTUS5392015
record_format	oai_dc
spelling	ndltd-TW-106NTUS53920152019-05-16T00:15:36Z http://ndltd.ncl.edu.tw/handle/hnqxuu An Enhanced Algorithm for Spatio-Textual Similarity Join by Considering Multiple Keyword Lengths 基於雙方文本長度之空間文本相似度演算法 Chen-Ju Liao 廖晨如碩士國立臺灣科技大學資訊工程系 106 With the popularity of mobile devices in conjunction with location based service (LBS), users generate a wealth of spatio-textual data, such as check-in for both Foursquare and Facebook, Facebook tag, point of interest (POI) generation, and more. Through these data, the issue of Spatio-textual Similarity has become important. For example, spatial-textual data generated by various sources can be effective to integrated, such as small spatial errors in space and different expressions in texts, and it can also be used to find high-similarity spatio-textual data published by different users on social networks to dating recommendations. However, how to efficiently calculate the similarity of these spatial-textual data and find out k pairs with highest similarity are the main topics to be discussed in this thesis. In this paper, in order to improve the algorithmic efficiency of similarity between spatial and textual data and to improve the performance of the existing methods, we study the spatio-textual signature framework. The concept is to generate a spatio-textual signature set for all spatial-textual data. If the similarity scores for each other are likely to be in the highest k result, the two spatio-textual signature sets must overlap. With this concept, we can prune pairs that do not overlap. In our research, in order to improve the filtering efficiency of the spatio-textual signature, each spatial-textual data will create different spatio-textual signature for others depending on their length of textual description. As a result, there is a large amount of reduction in the similarity calculation of the spatial-textual data. Besides, we employ synthetic database and Twitter database in the experiment to show that our method outperforms the other method significantly. Ge-Ming Chiu 邱舉明 2018 學位論文 ; thesis 57 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立臺灣科技大學 === 資訊工程系 === 106 === With the popularity of mobile devices in conjunction with location based service (LBS), users generate a wealth of spatio-textual data, such as check-in for both Foursquare and Facebook, Facebook tag, point of interest (POI) generation, and more. Through these data, the issue of Spatio-textual Similarity has become important. For example, spatial-textual data generated by various sources can be effective to integrated, such as small spatial errors in space and different expressions in texts, and it can also be used to find high-similarity spatio-textual data published by different users on social networks to dating recommendations. However, how to efficiently calculate the similarity of these spatial-textual data and find out k pairs with highest similarity are the main topics to be discussed in this thesis. In this paper, in order to improve the algorithmic efficiency of similarity between spatial and textual data and to improve the performance of the existing methods, we study the spatio-textual signature framework. The concept is to generate a spatio-textual signature set for all spatial-textual data. If the similarity scores for each other are likely to be in the highest k result, the two spatio-textual signature sets must overlap. With this concept, we can prune pairs that do not overlap. In our research, in order to improve the filtering efficiency of the spatio-textual signature, each spatial-textual data will create different spatio-textual signature for others depending on their length of textual description. As a result, there is a large amount of reduction in the similarity calculation of the spatial-textual data. Besides, we employ synthetic database and Twitter database in the experiment to show that our method outperforms the other method significantly.
author2	Ge-Ming Chiu
author_facet	Ge-Ming Chiu Chen-Ju Liao 廖晨如
author	Chen-Ju Liao 廖晨如
spellingShingle	Chen-Ju Liao 廖晨如 An Enhanced Algorithm for Spatio-Textual Similarity Join by Considering Multiple Keyword Lengths
author_sort	Chen-Ju Liao
title	An Enhanced Algorithm for Spatio-Textual Similarity Join by Considering Multiple Keyword Lengths
title_short	An Enhanced Algorithm for Spatio-Textual Similarity Join by Considering Multiple Keyword Lengths
title_full	An Enhanced Algorithm for Spatio-Textual Similarity Join by Considering Multiple Keyword Lengths
title_fullStr	An Enhanced Algorithm for Spatio-Textual Similarity Join by Considering Multiple Keyword Lengths
title_full_unstemmed	An Enhanced Algorithm for Spatio-Textual Similarity Join by Considering Multiple Keyword Lengths
title_sort	enhanced algorithm for spatio-textual similarity join by considering multiple keyword lengths
publishDate	2018
url	http://ndltd.ncl.edu.tw/handle/hnqxuu
work_keys_str_mv	AT chenjuliao anenhancedalgorithmforspatiotextualsimilarityjoinbyconsideringmultiplekeywordlengths AT liàochénrú anenhancedalgorithmforspatiotextualsimilarityjoinbyconsideringmultiplekeywordlengths AT chenjuliao jīyúshuāngfāngwénběnzhǎngdùzhīkōngjiānwénběnxiāngshìdùyǎnsuànfǎ AT liàochénrú jīyúshuāngfāngwénběnzhǎngdùzhīkōngjiānwénběnxiāngshìdùyǎnsuànfǎ AT chenjuliao enhancedalgorithmforspatiotextualsimilarityjoinbyconsideringmultiplekeywordlengths AT liàochénrú enhancedalgorithmforspatiotextualsimilarityjoinbyconsideringmultiplekeywordlengths
_version_	1719163877936070656

An Enhanced Algorithm for Spatio-Textual Similarity Join by Considering Multiple Keyword Lengths

Similar Items