Summary: | 碩士 === 國立中央大學 === 資訊工程研究所 === 99 === Wrapper refers to program which is used to extract the specific data in web page, researchers can access specific data by wrapper and use information integration to transfer the data to be useful information, then provide a set of integrated network services, systems or data analysis system.
But the site developers often modify the website because of different needs, this making the original wrapper error that can’t extract data. At this situation, the program developer can just re-write or modify original wrapper to solve. For this reason, unsupervised wrapper induction is widely discussed in recent years. It builds extracted module automatically by the regularity of the dynamic web page and extracted data by such module, so programmer don’t need to write wrapper for specific website every time.
The problem unsupervised wrapper induction may encounter is its maintenance. If the website changes by time, we will have two extracted data at time t and at time t’. How to identify the related information and integrate them is our goal. We use the instance and structure information which generated by FiVatech (the unsupervised wrapper induction tool we used) to match the correlation attribute.
|