An Enhanced Algorithm for Spatio-Textual Similarity Join by Considering Multiple Keyword Lengths

碩士 === 國立臺灣科技大學 === 資訊工程系 === 106 === With the popularity of mobile devices in conjunction with location based service (LBS), users generate a wealth of spatio-textual data, such as check-in for both Foursquare and Facebook, Facebook tag, point of interest (POI) generation, and more. Through these d...

Full description

Bibliographic Details
Main Authors: Chen-Ju Liao, 廖晨如
Other Authors: Ge-Ming Chiu
Format: Others
Language:zh-TW
Published: 2018
Online Access:http://ndltd.ncl.edu.tw/handle/hnqxuu
id ndltd-TW-106NTUS5392015
record_format oai_dc
spelling ndltd-TW-106NTUS53920152019-05-16T00:15:36Z http://ndltd.ncl.edu.tw/handle/hnqxuu An Enhanced Algorithm for Spatio-Textual Similarity Join by Considering Multiple Keyword Lengths 基於雙方文本長度之空間文本相似度演算法 Chen-Ju Liao 廖晨如 碩士 國立臺灣科技大學 資訊工程系 106 With the popularity of mobile devices in conjunction with location based service (LBS), users generate a wealth of spatio-textual data, such as check-in for both Foursquare and Facebook, Facebook tag, point of interest (POI) generation, and more. Through these data, the issue of Spatio-textual Similarity has become important. For example, spatial-textual data generated by various sources can be effective to integrated, such as small spatial errors in space and different expressions in texts, and it can also be used to find high-similarity spatio-textual data published by different users on social networks to dating recommendations. However, how to efficiently calculate the similarity of these spatial-textual data and find out k pairs with highest similarity are the main topics to be discussed in this thesis. In this paper, in order to improve the algorithmic efficiency of similarity between spatial and textual data and to improve the performance of the existing methods, we study the spatio-textual signature framework. The concept is to generate a spatio-textual signature set for all spatial-textual data. If the similarity scores for each other are likely to be in the highest k result, the two spatio-textual signature sets must overlap. With this concept, we can prune pairs that do not overlap. In our research, in order to improve the filtering efficiency of the spatio-textual signature, each spatial-textual data will create different spatio-textual signature for others depending on their length of textual description. As a result, there is a large amount of reduction in the similarity calculation of the spatial-textual data. Besides, we employ synthetic database and Twitter database in the experiment to show that our method outperforms the other method significantly. Ge-Ming Chiu 邱舉明 2018 學位論文 ; thesis 57 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立臺灣科技大學 === 資訊工程系 === 106 === With the popularity of mobile devices in conjunction with location based service (LBS), users generate a wealth of spatio-textual data, such as check-in for both Foursquare and Facebook, Facebook tag, point of interest (POI) generation, and more. Through these data, the issue of Spatio-textual Similarity has become important. For example, spatial-textual data generated by various sources can be effective to integrated, such as small spatial errors in space and different expressions in texts, and it can also be used to find high-similarity spatio-textual data published by different users on social networks to dating recommendations. However, how to efficiently calculate the similarity of these spatial-textual data and find out k pairs with highest similarity are the main topics to be discussed in this thesis. In this paper, in order to improve the algorithmic efficiency of similarity between spatial and textual data and to improve the performance of the existing methods, we study the spatio-textual signature framework. The concept is to generate a spatio-textual signature set for all spatial-textual data. If the similarity scores for each other are likely to be in the highest k result, the two spatio-textual signature sets must overlap. With this concept, we can prune pairs that do not overlap. In our research, in order to improve the filtering efficiency of the spatio-textual signature, each spatial-textual data will create different spatio-textual signature for others depending on their length of textual description. As a result, there is a large amount of reduction in the similarity calculation of the spatial-textual data. Besides, we employ synthetic database and Twitter database in the experiment to show that our method outperforms the other method significantly.
author2 Ge-Ming Chiu
author_facet Ge-Ming Chiu
Chen-Ju Liao
廖晨如
author Chen-Ju Liao
廖晨如
spellingShingle Chen-Ju Liao
廖晨如
An Enhanced Algorithm for Spatio-Textual Similarity Join by Considering Multiple Keyword Lengths
author_sort Chen-Ju Liao
title An Enhanced Algorithm for Spatio-Textual Similarity Join by Considering Multiple Keyword Lengths
title_short An Enhanced Algorithm for Spatio-Textual Similarity Join by Considering Multiple Keyword Lengths
title_full An Enhanced Algorithm for Spatio-Textual Similarity Join by Considering Multiple Keyword Lengths
title_fullStr An Enhanced Algorithm for Spatio-Textual Similarity Join by Considering Multiple Keyword Lengths
title_full_unstemmed An Enhanced Algorithm for Spatio-Textual Similarity Join by Considering Multiple Keyword Lengths
title_sort enhanced algorithm for spatio-textual similarity join by considering multiple keyword lengths
publishDate 2018
url http://ndltd.ncl.edu.tw/handle/hnqxuu
work_keys_str_mv AT chenjuliao anenhancedalgorithmforspatiotextualsimilarityjoinbyconsideringmultiplekeywordlengths
AT liàochénrú anenhancedalgorithmforspatiotextualsimilarityjoinbyconsideringmultiplekeywordlengths
AT chenjuliao jīyúshuāngfāngwénběnzhǎngdùzhīkōngjiānwénběnxiāngshìdùyǎnsuànfǎ
AT liàochénrú jīyúshuāngfāngwénběnzhǎngdùzhīkōngjiānwénběnxiāngshìdùyǎnsuànfǎ
AT chenjuliao enhancedalgorithmforspatiotextualsimilarityjoinbyconsideringmultiplekeywordlengths
AT liàochénrú enhancedalgorithmforspatiotextualsimilarityjoinbyconsideringmultiplekeywordlengths
_version_ 1719163877936070656