Summary: | A tremendous amount of information is being shared every day on social media sites such as Facebook, Twitter or Google+. However, only a small portion of users provide their location information, which can be helpful in targeted advertising and many other services. Current methods in location estimation using social relationships consider social friendship as a simple binary relationship. However, social closeness between users and structure of friends have strong implications on geographic distances. In the first task, we introduce new measures to evaluate the social closeness between users and structure of friends. Then we propose models that use them for location estimation. Compared with the models which take the friend relation as a binary feature, social closeness can help identify which friend of a user is more important and friend structure can help to determine significance level of locations, thus improving the accuracy of the location estimation models. A confidence iteration method is further introduced to improve estimation accuracy and overcome the problem of scarce location information. We evaluate our methods on two different datasets, Twitter and Gowalla. The results show that our model can improve the estimation accuracy by 5% - 20% compared with state-of-the-art friend-based models.
In the second task, we also propose a Local Event Discovery and Summarization (LEDS) framework to detect local events from Twitter. Many existing algorithms for event detection focus on larger-scale events and are not sensitive to smaller-scale local events. Most of the local events detected by these methods are major events like important sports, shows, or big natural disasters. In this work, we propose the LEDS framework to detect both bigger and smaller events. LEDS contains three key steps: 1) Detecting possible event related terms by monitoring abnormal distribution in different locations and times; 2) Clustering tweets based on their key terms, time, and location distribution; and 3) Extracting descriptions include time, location, and key sentences of local events from clusters. The model is evaluated on a real-world Twitter dataset with more than 60 million tweets. The analysis of Twitter data can help to predict or explain many real-world phenomena. The relationships among events in the real world can be reflected among the topics on social media. In the third task, we propose the concept of topic association and the associated mining algorithms. Topics with close temporal and spatial relationship may have direct or potential association in the real world. Our goal is to mine such topic associations and show their relationships in different time-region frames. We propose to use the concepts of participation ratio and participation index to measure the closeness among topics and propose a spatiotemporal index to calculate them efficiently. With the topic filtering and the topic combination, we further optimize the mining process and the mining results.
|