Toward the Impact Analysis of the Ranking and Relationship between the Hierarchy and Clustering in Web Sites on the Page Popularity

碩士 === 國立成功大學 === 資訊工程學系碩博士班 === 94 === In recent years, search engines have already played the key roles among Web applications, and link analysis algorithms are the major methods to measure the important values of Web pages. They employ the conventional flat Web graph built by Web pages and link r...

Full description

Bibliographic Details
Main Authors: Lin Sheng-Feng, 林聖峰
Other Authors: Hung-Yu Kao
Format: Others
Language:en_US
Published: 2006
Online Access:http://ndltd.ncl.edu.tw/handle/10781696913935163946
id ndltd-TW-094NCKU5392067
record_format oai_dc
spelling ndltd-TW-094NCKU53920672015-12-16T04:31:53Z http://ndltd.ncl.edu.tw/handle/10781696913935163946 Toward the Impact Analysis of the Ranking and Relationship between the Hierarchy and Clustering in Web Sites on the Page Popularity 網站階層與群組之關係與排名在網頁聲望影響之研究 Lin Sheng-Feng 林聖峰 碩士 國立成功大學 資訊工程學系碩博士班 94 In recent years, search engines have already played the key roles among Web applications, and link analysis algorithms are the major methods to measure the important values of Web pages. They employ the conventional flat Web graph built by Web pages and link relations of Web pages to obtain the relative importance of Web objects. Previous researches have observed that PageRank-like link analysis algorithms have a bias against newly created Web pages. A new ranking algorithm called Page Quality was proposed to save this issue. Page Quality anticipates future ranking values by the difference rate between current ranking values and previous ranking values. In this paper, we propose a new algorithm called DRank to diminish the bias of PageRank-like link analysis, and attain the better performance of Page Quality. In this algorithm, we model Web graph as a three-layer graph which includes Host Graph, Directory Graph and Page Graph by using the hierarchical structure of URLs and the structure of link relation of Web pages. At first, we discuss the aggregated phenomena of link relations within host level and directory level and according to what we observe we assign different weight to different types of links. We then calculate the importance of hosts, Directories and Pages by weighted graph we built. We find two phenomena: One is that hosts or directories that have higher rank value contain the majority of important pages and we observe that directory level is a better block level to prove new pages created within an important blocks have the higher probability to be important pages. The other is that there are ladder-graphs within directories while we sort ranking values within directories in the decreasing order. By combining Page Quality algorithm and the two phenomena we state above, we can predicate the more accurate values of page importance to diminish the bias of newly created pages. Experiment results on our data shows that DRank algorithm works well on anticipating future ranking values of pages, and the performance of DRank is better than Page Quality. Hung-Yu Kao 高宏宇 2006 學位論文 ; thesis 47 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立成功大學 === 資訊工程學系碩博士班 === 94 === In recent years, search engines have already played the key roles among Web applications, and link analysis algorithms are the major methods to measure the important values of Web pages. They employ the conventional flat Web graph built by Web pages and link relations of Web pages to obtain the relative importance of Web objects. Previous researches have observed that PageRank-like link analysis algorithms have a bias against newly created Web pages. A new ranking algorithm called Page Quality was proposed to save this issue. Page Quality anticipates future ranking values by the difference rate between current ranking values and previous ranking values. In this paper, we propose a new algorithm called DRank to diminish the bias of PageRank-like link analysis, and attain the better performance of Page Quality. In this algorithm, we model Web graph as a three-layer graph which includes Host Graph, Directory Graph and Page Graph by using the hierarchical structure of URLs and the structure of link relation of Web pages. At first, we discuss the aggregated phenomena of link relations within host level and directory level and according to what we observe we assign different weight to different types of links. We then calculate the importance of hosts, Directories and Pages by weighted graph we built. We find two phenomena: One is that hosts or directories that have higher rank value contain the majority of important pages and we observe that directory level is a better block level to prove new pages created within an important blocks have the higher probability to be important pages. The other is that there are ladder-graphs within directories while we sort ranking values within directories in the decreasing order. By combining Page Quality algorithm and the two phenomena we state above, we can predicate the more accurate values of page importance to diminish the bias of newly created pages. Experiment results on our data shows that DRank algorithm works well on anticipating future ranking values of pages, and the performance of DRank is better than Page Quality.
author2 Hung-Yu Kao
author_facet Hung-Yu Kao
Lin Sheng-Feng
林聖峰
author Lin Sheng-Feng
林聖峰
spellingShingle Lin Sheng-Feng
林聖峰
Toward the Impact Analysis of the Ranking and Relationship between the Hierarchy and Clustering in Web Sites on the Page Popularity
author_sort Lin Sheng-Feng
title Toward the Impact Analysis of the Ranking and Relationship between the Hierarchy and Clustering in Web Sites on the Page Popularity
title_short Toward the Impact Analysis of the Ranking and Relationship between the Hierarchy and Clustering in Web Sites on the Page Popularity
title_full Toward the Impact Analysis of the Ranking and Relationship between the Hierarchy and Clustering in Web Sites on the Page Popularity
title_fullStr Toward the Impact Analysis of the Ranking and Relationship between the Hierarchy and Clustering in Web Sites on the Page Popularity
title_full_unstemmed Toward the Impact Analysis of the Ranking and Relationship between the Hierarchy and Clustering in Web Sites on the Page Popularity
title_sort toward the impact analysis of the ranking and relationship between the hierarchy and clustering in web sites on the page popularity
publishDate 2006
url http://ndltd.ncl.edu.tw/handle/10781696913935163946
work_keys_str_mv AT linshengfeng towardtheimpactanalysisoftherankingandrelationshipbetweenthehierarchyandclusteringinwebsitesonthepagepopularity
AT línshèngfēng towardtheimpactanalysisoftherankingandrelationshipbetweenthehierarchyandclusteringinwebsitesonthepagepopularity
AT linshengfeng wǎngzhànjiēcéngyǔqúnzǔzhīguānxìyǔpáimíngzàiwǎngyèshēngwàngyǐngxiǎngzhīyánjiū
AT línshèngfēng wǎngzhànjiēcéngyǔqúnzǔzhīguānxìyǔpáimíngzàiwǎngyèshēngwàngyǐngxiǎngzhīyánjiū
_version_ 1718149080396333056