Summary: | Recently, the fine-grained geolocalization of User-Generated Short Texts (UGST) has become increasingly important. One challenge is that UGST contains relatively little location-indicative information due to such limitations as text length. Therefore, extract and effectively use the location-indicative information is the key issue for improving the effect of geolocalization. The existing works only consider the global weight of the terms and do not distinguish between the importance of identical terms in different locations. In addition, the existing add-one smoothing masks the difference between the features of different locations. In this paper, we propose a fine-grained geolocalization method to predict the PoI-level location of UGSTs based on a weight probability model (FGST-WP). The method mainly includes three parts: 1) Using the reverse maximum match algorithm to filter out UGSTs that do not contain any location-indicative information. 2) Building coupling of terms and locations and adopting a mixed weight strategy to assign weights to terms. 3) Calculating the probability of nongeotagged UGST posted from each location and selecting k locations according to the top-k probabilities. The accuracy of FGST-WP on the three ground-truth datasets reaches 45%, 68%, and 72%, respectively. The results indicate the superior performance of FGST-WP.
|