Toward the Impact Analysis of the Ranking and Relationship between the Hierarchy and Clustering in Web Sites on the Page Popularity
碩士 === 國立成功大學 === 資訊工程學系碩博士班 === 94 === In recent years, search engines have already played the key roles among Web applications, and link analysis algorithms are the major methods to measure the important values of Web pages. They employ the conventional flat Web graph built by Web pages and link r...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2006
|
Online Access: | http://ndltd.ncl.edu.tw/handle/10781696913935163946 |
id |
ndltd-TW-094NCKU5392067 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-094NCKU53920672015-12-16T04:31:53Z http://ndltd.ncl.edu.tw/handle/10781696913935163946 Toward the Impact Analysis of the Ranking and Relationship between the Hierarchy and Clustering in Web Sites on the Page Popularity 網站階層與群組之關係與排名在網頁聲望影響之研究 Lin Sheng-Feng 林聖峰 碩士 國立成功大學 資訊工程學系碩博士班 94 In recent years, search engines have already played the key roles among Web applications, and link analysis algorithms are the major methods to measure the important values of Web pages. They employ the conventional flat Web graph built by Web pages and link relations of Web pages to obtain the relative importance of Web objects. Previous researches have observed that PageRank-like link analysis algorithms have a bias against newly created Web pages. A new ranking algorithm called Page Quality was proposed to save this issue. Page Quality anticipates future ranking values by the difference rate between current ranking values and previous ranking values. In this paper, we propose a new algorithm called DRank to diminish the bias of PageRank-like link analysis, and attain the better performance of Page Quality. In this algorithm, we model Web graph as a three-layer graph which includes Host Graph, Directory Graph and Page Graph by using the hierarchical structure of URLs and the structure of link relation of Web pages. At first, we discuss the aggregated phenomena of link relations within host level and directory level and according to what we observe we assign different weight to different types of links. We then calculate the importance of hosts, Directories and Pages by weighted graph we built. We find two phenomena: One is that hosts or directories that have higher rank value contain the majority of important pages and we observe that directory level is a better block level to prove new pages created within an important blocks have the higher probability to be important pages. The other is that there are ladder-graphs within directories while we sort ranking values within directories in the decreasing order. By combining Page Quality algorithm and the two phenomena we state above, we can predicate the more accurate values of page importance to diminish the bias of newly created pages. Experiment results on our data shows that DRank algorithm works well on anticipating future ranking values of pages, and the performance of DRank is better than Page Quality. Hung-Yu Kao 高宏宇 2006 學位論文 ; thesis 47 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立成功大學 === 資訊工程學系碩博士班 === 94 === In recent years, search engines have already played the key roles among Web applications, and link analysis algorithms are the major methods to measure the important values of Web pages. They employ the conventional flat Web graph built by Web pages and link relations of Web pages to obtain the relative importance of Web objects. Previous researches have observed that PageRank-like link analysis algorithms have a bias against newly created Web pages. A new ranking algorithm called Page Quality was proposed to save this issue. Page Quality anticipates future ranking values by the difference rate between current ranking values and previous ranking values. In this paper, we propose a new algorithm called DRank to diminish the bias of PageRank-like link analysis, and attain the better performance of Page Quality. In this algorithm, we model Web graph as a three-layer graph which includes Host Graph, Directory Graph and Page Graph by using the hierarchical structure of URLs and the structure of link relation of Web pages. At first, we discuss the aggregated phenomena of link relations within host level and directory level and according to what we observe we assign different weight to different types of links. We then calculate the importance of hosts, Directories and Pages by weighted graph we built. We find two phenomena: One is that hosts or directories that have higher rank value contain the majority of important pages and we observe that directory level is a better block level to prove new pages created within an important blocks have the higher probability to be important pages. The other is that there are ladder-graphs within directories while we sort ranking values within directories in the decreasing order. By combining Page Quality algorithm and the two phenomena we state above, we can predicate the more accurate values of page importance to diminish the bias of newly created pages. Experiment results on our data shows that DRank algorithm works well on anticipating future ranking values of pages, and the performance of DRank is better than Page Quality.
|
author2 |
Hung-Yu Kao |
author_facet |
Hung-Yu Kao Lin Sheng-Feng 林聖峰 |
author |
Lin Sheng-Feng 林聖峰 |
spellingShingle |
Lin Sheng-Feng 林聖峰 Toward the Impact Analysis of the Ranking and Relationship between the Hierarchy and Clustering in Web Sites on the Page Popularity |
author_sort |
Lin Sheng-Feng |
title |
Toward the Impact Analysis of the Ranking and Relationship between the Hierarchy and Clustering in Web Sites on the Page Popularity |
title_short |
Toward the Impact Analysis of the Ranking and Relationship between the Hierarchy and Clustering in Web Sites on the Page Popularity |
title_full |
Toward the Impact Analysis of the Ranking and Relationship between the Hierarchy and Clustering in Web Sites on the Page Popularity |
title_fullStr |
Toward the Impact Analysis of the Ranking and Relationship between the Hierarchy and Clustering in Web Sites on the Page Popularity |
title_full_unstemmed |
Toward the Impact Analysis of the Ranking and Relationship between the Hierarchy and Clustering in Web Sites on the Page Popularity |
title_sort |
toward the impact analysis of the ranking and relationship between the hierarchy and clustering in web sites on the page popularity |
publishDate |
2006 |
url |
http://ndltd.ncl.edu.tw/handle/10781696913935163946 |
work_keys_str_mv |
AT linshengfeng towardtheimpactanalysisoftherankingandrelationshipbetweenthehierarchyandclusteringinwebsitesonthepagepopularity AT línshèngfēng towardtheimpactanalysisoftherankingandrelationshipbetweenthehierarchyandclusteringinwebsitesonthepagepopularity AT linshengfeng wǎngzhànjiēcéngyǔqúnzǔzhīguānxìyǔpáimíngzàiwǎngyèshēngwàngyǐngxiǎngzhīyánjiū AT línshèngfēng wǎngzhànjiēcéngyǔqúnzǔzhīguānxìyǔpáimíngzàiwǎngyèshēngwàngyǐngxiǎngzhīyánjiū |
_version_ |
1718149080396333056 |