Summary: | 碩士 === 國立中央大學 === 資訊工程學系 === 103 === With the increased popularity of mobile devices, local search has become a new popular service. However, a complete local search service have to provide nearby POIs (Point-of-Interest) like stores, shops, gas stations, parking lots, bus stops, drugstore for users. Therefore, we need a powerful POI database to support that. In recent years, the web has become the largest data source of POIs. With the prevalence of Internet, people will share their travel experience and information of POIs that they had been visited on social network, their blogs, and even check-in post. Besides, many companies and organizations publish their business on their own websites. Those webpages contain a large number of POIs.
In this paper, we propose a POI database construction system based on the immense data of the Web. Our system could be separated into two parts: the query-based crawler, the POI extraction system. The goal of query-based crawler is to collect ABP (address-bearing pages) from the web as address is a good indicator of POIs. The second part is POI extraction system. We use CRF (Conditional Random Field) to train a Chinese postal address recognition model and a Chinese organization recognition model. Then POI extraction system extracts addresses and POI names from ABP with these two CRF models and pairs an address and a POI name as a POI. In the end, POI extraction system will extract POI associated information for each POI to construct a complete POI data.
|