A Novel Algorithm Using Link Information To Discover Alternative Pages For 404 Errors

碩士 === 國防大學理工學院 === 資訊工程碩士班 === 100 === Broken links are links that lead to websites that do not exist, which are due to websites removed or their URLs changed. Broken links will significantly reduce reference, source citation, and cause the incomplete information of a website. There are two traditi...

Full description

Bibliographic Details
Main Authors: Liao,Yishang, 廖詒旋
Other Authors: 陳善泰
Format: Others
Language:zh-TW
Published: 2012
Online Access:http://ndltd.ncl.edu.tw/handle/64782210512182981687
Description
Summary:碩士 === 國防大學理工學院 === 資訊工程碩士班 === 100 === Broken links are links that lead to websites that do not exist, which are due to websites removed or their URLs changed. Broken links will significantly reduce reference, source citation, and cause the incomplete information of a website. There are two traditional ways to repair broken links, namely index servers and search engines. Index servers cannot rapidly react and update broken links resulting from website movement. On the other hand, search engines may discover a great number of similar websites, but they cannot identify the original one. Furthermore, the selection of keywords would seriously affect the search results. Therefore, both the two methods are not appropriate to deal with the problem. This research proposes a novel algorithm that uses link information to (1) recover broken links and (2) discover alternative pages for 404 errors, and then achieves the following results: 1. Developing the Broken New Page Finding (BNPF) algorithm and implementing a BNPF system that realizes the algorithm. 2. Proveing the theorem that if a URL of a website has been changed, BNPF can guarantee to efficiently discover the new website. 3. If a website has been removed, BNPF can find an alternative website that is similar to the original one. Experimental results show that BNPF obtains both higher similarity and hit rate than Google search.